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ABSTRACT 

This  paper  presents  the  design  and  implementation  of  a 
particular  Language  Implementation  System  (LIS)  and 
investigates  the  utilization  of  that  system  in  the 
development  of  artificial  languages  and  their  associated 
processors . 

The  Language  Implementation  System  accepts  the  formal 
definition  of  the  syntax  (expressed  in  Backus  Naur 
Form  -  BNF)  and  the  semantics  (expressed  in  the  programming 
language  PL/I)  of  an  artificial  language,  and  synthesizes  a 
processor  for  that  language.  The  parsers  (lexical  and 
primary)  of  the  processes  are  highly  efficient 
Deterministic  Push  Down  Automata  ( DPDAs )  computed  from  the 
language's  CLR(k)  grammar.  The  CLR(k)  (Comprehensive  Left 
to  fiight,  looking  ahead  Jt  symbols)  grammars  are  defined  in 
the  paper,  and  are  shown  to  include  virtually  all 
"practical"  artificial  languages.  The  semantic  interpreter 
of  an  artificial  language  is  activated  for  a  particular  BNF 
rule  whenever  a  syntactic  construct  defined  by  that  rule  is 
recognized  during  the  parse  of  the  language's  input  text. 

Applications  of  the  Language  Implementation  System  are 
presented,  and  the  system  is  shown  to  be  applicable  not  only 
to  "traditional*  artificial  languages  such  as  PL/I,  Algol, 
and  Lisp,  but  also  to  interactive  management 
information/decision  system  languages.  Furthermore,  the 
processors  produced  by  LIS  are  not  limited  to  traditional 
translators*  but  are  also  shown  to  be  useful  in  developing 
complex  Man-Machine  decision  systems  in  which  they  may  be 
viewed  as  computational  dispatchers  or  structured  interfaces 
between  the  user  and  a  more  complex  computational 
facility. 
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Chapter  I 


Art  if lclal-Languflja-QaY&iflBmgDl 
and  the 

1  nnq|iag*  implementation  System 


Languages  die,  too,  like  individuals.... 
They  may  be  embalmed  and  preserved  for 
posterity,  changeless  and  static, 
life-like  in  appearance  but  unendowed 
with  the  breath  of  life,  flhile  they 
live,  however,  they  change. 

Mario  Pei 

The  Story  of  Language 


I . A  iniroduLilan 

In  this  thesis,  we  present  the  design  and 
implementation  of  a  particular  language  implementation 
system  and  investigate  the  utilization  of  our  system  in  the 
development  of  artificial  languages  and  their  associated 
processors.  By  artificial  languages  we  include,  of  course, 
traditional  programming  languages  such  as  FORTRAN,  ALGOL, 
LISP,  PL/I,  and  COBOL.  However,  adopting  the  attitude  that 
an  artificial  language  represents  a  logical  Man-Machine 
interface,  we  also  include  Job  control  languages,  end-user 
language  facilities,  and  interactive  management 
information/decision  system  languages. 
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By  language  processors,  we  include  the  compilers  and 
interpreters  of  such  “traditional"  languages  as  cited  above. 
However,  here  too  we  expand  the  traditional  terminology  to 
include  interactive  compilers  and  complex  Man-Machine 
decision  systems  in  which  the  language  processors  may  be 
viewed  as  computation  dispatchers  or  structured  interfaces 
that  permit  the  end-usir  to  effectively  communicate  with  a 
more  complex  computational  facility. 

The  objective  of  the  language  implementation  system 
technology  (DeRemer  first  introduced  the  term  -  DeR  70)  is 
to  provide  the  language  designer  with  a  software  capability 
that  addresses  itself  to  the  problems  of  language  design  and 
specification,  so  that  by  precisely  defining  the  syntax 
(form)  and  semantics  (meaning)  of  a  particular  artificial 
language,  the  designer  may  leave  to  the  language 
implementation  system  the  task  of  implementing  the  processor 
for  his  language.  This  is  indeed  an  ambitious  objective  and 
one  that  has  not  yet  been  fully  realized.  However, 
substantial  advancements  have  been  made  in  the  automation  of 
the  syntactic  recognition  processes  for  artificial 
languages,  and  in  the  subsequent  discussions  we  present  the 
most  significant  of  these  by  wey  of  its  incorporation  into 
the  design  of  our  Language  Implementation  System  (LIS). 
Furthermore,  we  define  a  framework  for  associating  these 
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recognition  processes  with  current  and  potential 
advancements  in  formal  semantic  systems  so  that  the 
Language  Implementation  System  mav  have  an  acceptable 
expectation  of  successfully  evolving  towards  the  achievement 
of  the  objective  previously  set  forth. 

As  a  first  approximation,  the  motivation  for  the 
development  of  language  implementation  systems  may  be  seen 
as  evolving  out  of  a  natural  desire  to  make  the  development 
of  processors  for  traditional  languages  more  flexible, 
efficient,  well-structured,  and  reliable  than  has  typically 
been  the  case  with  manual  or  semi-automatic  techniques. 
However,  it  is  part  of  the  present  thesis  that  a 
sufficiently  comprehensive  language  implementation  system 
will  not  only  facilitate  the  development  of  traditional 
language  processors,  but  will  also  support  the  development 
of  languages  and  processors  for  special  purpose  end-user 
facilities  and  Man-Machine  decision  systems  in  which  a 
primary  consideration  is  the  ability  of  such  systems  to 
adapt  to  *in  ever  changing  problem  environment.  Thus,  to  the 
extent  that  the  domain  of  feasible  problem  environments  to 
which  such  computation  may  be  successfully  applied  is  a 
function  of  the  supporting  language  facilities,  it  is  our 
belief  that  language  implementation  systems  such  as  our  own 
will  play  a  significant  role  in  the  expansion  of  this 
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doma in . 


In  the  remainder  of  this  chapter,  we  address  several 
topics  of  an  introductory  nature.  In  Section  I.B,  we 
present  a  representative  model  of  a  processor  for  an 
artificial  language  and  use  this  model  to  define  the 
general  functional  capability  of  the  Language  Implementation 
System.  In  Section  I.C,  we  get  more  specific  and  identify 


the  system  objectives  and 

design  criteria 

of 

LIS 

.  In 

Section  I.D,  we  suggest  the 

contributions  made 

by 

this 

thesis  to  the  development 

and  application 

of 

language 

implementation  system  technology.  In  Section  I 

.E, 

we 

give 

the  reader  a  brief  overview  of  our  approach  in  the  remainder 
of  the  thesis. 
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I.B  Language  Processing .Mode I s  and  the 
Laoauaflfi— LmQlfiinanLatlgn_Svstem 

In  this  section*  we  present  a  particular  language 
processing  model  and  use  that  model  to  more  clearly  identify 
the  basic  objectives  of  the  Language  Implementation  System. 
The  model  that  we  choose  is  the  classical  seven  phase 
compiler  model  as  presented,  for  example,  by  Donovan  (Don 
72).  We  use  the  model  of  a  compiler  because  most  of  the 
important  issues  of  language  processor  design  arise  in  the 
context  of  a  compiler.  As  previously  indicated,  we  shall 
subsequently  discuss  the  application  of  LIS  to  other 
language  processors,  so  that  the  present  discussion  is  not 
meant  to  imply  a  particular  limitation  on  the  system's 
aDpl icabil itv. 

The  model  that  we  use  is  given  in  Figure  I. I,  and  is 
strictly  a  lUDCliflOfll  model,  as  the  associated  data  bases 
have  been  omitted.  Within  the  model,  the  functions  of  the 
phases  may  be  identified  as  follows* 

The  process  of  recognizing  the  basic  elements, 
or  lexical  constructs,  of  the  language  cs  they 
are  encountered  in  the  submitted  program 
( Input  Text) . 

2.  Syntax  Analysis 

The  proce53  whereby  the  syntactic  structure 
or  phrase  structure  of  the  submitted  program 
is  analyzed  in  terms  of  the  lexical  constructs 
recognized  by  Phase  I. 
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The  process  of  associating  semantics,  or 
meaning,  with  the  phrase  structure  as  analyzed 
in  Phase  2.  The  semantic  interpretation  is  in 
the  form  of  an  intermediate  target  languaae 
representation  and  a  set  of  data  bases. 


4.  Machine. Independent. Optimization 

The  process  whereby  machine  independent 
optimizations  are  performed  on  the 
intermediate  target  language  representation  so 
as  to  produce  a  more  efficient  semantic 
representation  of  the  submitted  program. 

5 .  Storage  .Assignment 

The  process  of  determining  the  storage 
allocation  for  the  program's  data  elements  and 
the  compiler  generated  data  elements. 


6. 


The  process  of  generating  the  assembly 
representation  of  the  submitted  program, 
symbolic  references. 


code 

with 


7. 


The  process  of  resolving  symbolic  references 
and  generating  the  machine  language 
representation  of  the  submitted  program. 


As  indicated  in  Figure  I. I,  the  first  three  phases  are 
largely  language  dependent  and  machine  independent, 
whereas  the  subsequent  phases  are  largely  language 
independent  and  machine  dependent.  Thus,  the  theory  would 
imply  that  for  a  given  machine,  one  need  only  be  concerned 
with  the  first  three  phases  when  implementing  a  new 
language,  and  conversely,  with  only  phases  four  through 
seven  when  transporting  an  existing  language  to  a  new 
computing  environment.  Of  course,  such  idealizations  have 
rarely  been  realized. 


Given  this  model,  we  may  now  begin  to  identify  more 
clearly  the  basic  objectives  of  LIS.  Stated  most  simplv, 
the  objectives  of  LIS  are  to  automate  completely  the 
lexical  analysis  and  syntax  analysis  phases,  while  providinq 
a  convenient  framework  for  associating  semantic 
interpretation  with  the  syntactic  constructs  of  the  language 
being  implemented  (Phase  3  -  Interpretation). 

In  the  next  section,  we  identify  the  specific 
objectives  and  design  criteria  of  LIS. 
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I.C  Qfal£dlttg£_aQd_DflSlQH  Criteria  of  the 

I.anaiiflgfi-IinplflmeDtatlQn  System 


The  objectives  of  the  Language  Implementation  System 
are  to  facilitate  the  design  of  artificial  languages  and  the 
implementation  of  their  processors  bv» 

1 .  Providing  analysis  procedures  capable  of 
detecting  and  identifying  syntactic  ambiguity, 
as  well  as  for  verifying  certain 
characteristics  required  of  well  structured 
artificial  languages. 

2.  Providing  the  capability  to  automate 
completely  the  implementation  of  the  syntactic 
recognition  processes  of  artificial  languages 
satisfying  the  criteria  implied  above. 

3.  Providing  a  convenient  structure  within  which 
semantic  interpretation  may  be  nsoclated  with 
the  syntactic  constructs  of  those  languages 
whose  recognition  processes  have  been 
automated  by  the  system. 


rtith  respect  the  first  two  objectives,  the  system 
should  be  equally  effective  when  used  by  either  language 
designers  or  experienced  processor  implementers.  Since  the 
automation  procedures  take  their  input  from  a  formal 
.'Dec i f ication  of  the  syntax  of  the  particular  language 
(expressed  in  Backus  Naur  Form),  no  particular  experience 
with  processor  implementation  should  be  necessary,  or  even 
useful.  However,  since  formal  semantic  systems  have  not  yet 
been  developed  to  the  point  where  they  are  both  sufficiently 
general  and  efficient  as  to  be  incorporated  into 
commercially  marketable  language  processors,  our  approach 


with  regard  to  objective  three  must  be  to  provide  a  flexible 
framework  within  which  the  designer/ implementer  may  employ 
the  capabilities  of  a  general  purpose  programming  language 
in  associating  semantic  interpretation  with  the  constructs 
of  his  language. 


The  following  design  criteria  were  established  in 
support  of  the  above  objectives* 

1.  The  system  must  be  capable  of  handling  a  wide 
variety  of  programming  languages.  In  the 
development  of  new  languages,  the  system 
should  serve  as  a  design  tool  by  pointing  out 
areas  of  syntactic  ambiguity  and  complexity, 
since  these  will  be  bothersome  to  both 
implementer  and  user.  In  its  application  to 
existing  languages,  the  system  should  be 
sufficiently  general  as  to  admit  the 
representation  of  most  of  these  in  such  a  way 
as  to  require  little  or  no  rework  of  the 
language's  existing  syntax. 

2.  The  syntactic  recognizers  produced  by  the 
svstem  must  come  with  an  "error-free" 
guarantee.  That  is,  given  that  the  system  has 
claimed  to  have  generated  a  recognizer  from  a 
particular  specification,  we  require  that* 

a.  The  recognizer  always  parse  input  text 
generated  by  the  given  specification. 

Since  we  also  require  that  the  system 
not  generate  recognizers  for  ambiguous 
languages,  we  therefore  know  that  the 
resulting  parse  will  be  „ne  only  parse. 

b.  The  recognizer  always  reject  input  text 
not  generated  by  the  given 
spec  if ication. 

Satisfying  this  criterion  implies  that  during 
processor  development,  all  processing  errors 
will,  a  priori .  be  the  result  of  errors  in  the 
semantic  specification. 
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Of  course*  the  specification  itself  may  be  in 
error  in  the  sense  that  it  generates  text  that 
the  designer  “did  not  have  in  mind",  and 
although  the  system  wiii  not  be  able  to  handle 
this  problem  directly*  it  must  be  designed  in 
such  a  way  as  to  allow  the  user  to  introduce 
change  into  his  language  in  a  convenient  and 
timely  manner*  This  requirement  leads  to  the 
next  design  criterion. 

3.  The  syntactic  meta-language  that  constitutes 
the  input  to  the  system  must  be  palatable  to 
the  system  user.  Although  this  characteristic 
is  somewhat  difficult  to  define  precisely*  it 
carries  with  it  the  following  notions' 

a.  Non-procedural  (independence  as  to  the 
ordering  of  the  meta-language  rules 
within  a  particular  definition). 

b.  Free  format  (within  the  meta-language 
rules) . 

c.  The  ability  to  handle  long  (many 
;haracter)  symbols*  so  as  to  allow  the 
user  to  be  descriptive  in  his  naming 
conventions . 

d.  A  minimal  number  of  reserved  symbols. 

In  addition,  we  require  that  the  (system 
acceptable)  syntax  specification  be  such  that 
the  same  document  may  appropriately  be  used  to 
define  the  syntax  of  a  language  to  the 
of  that  language.  Because  of  this  requirement, 
we  insist  that  the  syntactic  meta-language  be 
high-level  and  resemble,  as  closely  as 
possible,  the  most  universally  accepted 
meta— language  currently  found  in  the  computer 
literature. 

4.  Procedure  must  exist  to  provide  for  syntax 
directed  error  detection,  reporting,  and 
recovery  when  illegal  text  is  encountered 
during  syntax  analysis  of  programs  written  in 
the  specified  language.  Although  these 
procedures  must  be  general  enough  to  report 
and  recover  from  (continue  parsing)  most 
context-free  syntax  errors,  a  facility  must 
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also  be  provided  to  allow  the  language 
designer/implementer  to  substitute  his  own 
reporting  and  recovery  procedures  in  those 
cases  in  which  the  general  capability  is  found 
to  be  inappropriate. 

5.  The  syntactic  recognizers  produced  by  the 

system  must  be  Hjd£  efficient  in  the  sense 
that  their  speed  of  recognition  for  a 
particular  language  must  be  comparable  to,  or 
faster  than,  alternate  methoas  of 
implementation  for  the  same  language. 

6.  The  syntactic  recognizers  produced  by  the 

system  must  be  space  efficient  in  the  sense 
that  the  space  required  for  their 
representation  must  be  comparable  to,  or  less 
than,  alternate  methods  of  implementation  for 
the  same  language. 

7.  The  structure  chosen  for  language  definition 
must  provide  a  convenient  framework  for  the 
specification  of  semantics.  In  particular, 
the  semantics  of  a  particular  syntactic 
construct  should  be  expressible  in-J ine  with 
the  syntax  of  that  construct. 

8.  The  error  messages  delivered  by  the  system 
when  processing  a  particular  definition  should 
be  precise  and  relative  to  the  input 
specification.  This  will  allow  language 
debugging  to  proceed  on-line  in  most 
instances. 


By  specifying  the  above  design  criteria,  we  identify, 
in  a  general  way,  the  fundamental  capabilities  of  the 
Language  Implementation  System.  With  the  exception  of  the 
syntax  directed  error  handling  capability  (which  has  been 
designed),  all  of  the  criteria  have  been  met  in  the  current 
implementation  of  the  system. 
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I.D  Thesis  Contributions 


The  theoretic  foundation  of  the  Language  Implementation 
System  is  derived  from  the  work  by  Knuth  (Kn  65)  and  Dertemer 
(DeR  69)  on  LR(k)  systems.  Knuth  developed  the  theoretic 
concepts  of  left  to  right  translation  when  he  defined  the 
LR( k)  grammars,  although  it  was  not  until  DeRemer's  work 
that  LR(  k)  systems  were  seen  as  having  high  potential  in 
the  area  of  “practical"  language  processors.  These  two 
works  are  now  regarded  as  classics  in  the  application  of 
automata  theory  to  language  translation,  and  are  strongly 
recommended  to  the  reader  demanding  a  riaorous  treatment  of 
the  theoretic  foundation  of  LIS. 


The  purpose  of  this  thesis  is  nal  to  go  over  the 
theoretic  development  of  LR(k)  systems  to  any  significant 
extent.  Rather,  we  believe  that  our  contributions  to  the 
development  of  language  implementation  system  technology 
may  be  identified  as  follows* 

J.  The  definition  of  a  subset  of  the  LR(k) 
gramnars,  called  the  Comprehensive  LR(k) 
grammars  (CLR(k)).  We  call  the  grammars 
"comprehensive"  because  they  include 
virtually  all  of  the  "practical"  grammars  with 
which  we  have  come  in  contact. 

2.  The  design  and  implementation  of  e 
commercially  viable  language  implementation 
system  based  on  the  CLR(k)  grammars.  LIS  is 
commercially  viable  in  the  sense  that 
Honeywell  has  found  it  appropriate  to  use  the 
system,  as  implemented  on  Multics,  to  generate 
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parsers  for  languages  to  be  Implemented  on 
other  computers. 

3.  The  application  of  LIS  to  the  development  of 
various  artificial  languages  and  their 
associated  processors. 
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I.E  Thesis  Approach 


As  already  indicated,  it  is  the  intent  of  this  thesis 
to  complement  the  work  of  Knuth  and  DeRemer  by  developing 
a  well-engineered,  commercially  viable  languaqe 
implementation  system  based  on  LR(k)  techniques,  and  to 
investigate  the  utilization  of  our  system  in  the  development 
of  artificial  languages  and  their  associated  processors.  We 
are  in  an  enviable  position,  with  respect  to  LR( k)  systems, 
of  having  a  well  developed  theoretic  base.  However,  our 
approach  in  the  development  of  the  fundamental  algo. ithms  of 
LIS  in  Chapter  III  is  to  present  the  notions  of  finite  state 
machines  and  deterministic  push  down  automata  on  what 
essentially  reduce  to  intuitive  notions.  That  is,  given  the 
luxury  of  a  well  developed  theory,  we  choose  to  bypass  that 
theory  and  appeal  to  the  reader's  basic  analytic  skills, 
often  using  examples  to  illustrate  the  algorithms. 

Even  with  our  emphasis  on  design  and  application, 
however,  there  remains  a  moderate  amount  of  terminology  and 
fundamental  conceptual  material  that  is  essential  to  our 
subsequent  discussions.  This  is  presented  in  the  first  part 
of  Chapter  II.  Also  in  Chapter  II,  we  describe  the  basic 
st  ucture  of  LIS  from  the  point  of  view  of  artificial 
language/processor  development,  and  indicate  the  way  in 
which  the  fundamental  algorithms  to  be  presented  in  Chapter 


Ill  are  combined  to  achieve  the  functional  capability  of 
LIS. 

In  Chapter  III,  we  develop  the  fundamental  algorithms 
of  LIS. 

In  Chapter  IV,  we  cover  a  variety  of  issues,  including 
the  efficiency  of  the  parsers  produced  by  LIS,  the  design 
of  our  syntax  directed  error  handling  procedures,  the 
hierarchy  of  LR(k)  systems  and  how  the  CLR(k)  grammars 
relate  to  that  hierarchy,  and  a  discussion  of  some  of  the 
more  significant  applications  of  LIS  to  date.  Chapter  IV 
concludes  the  main  portion  of  the  thesis,  and  in  the  last 
sections  of  the  chapter  we  summarize  our  work  and  suggest 
areas  of  future  investigation  that  can  be  expected  to  have 
significant  Impact  on  language  implementation  system 
technology. 

The  thesis  is  structured  so  that  by  reading  Chapters  I 
through  IV,  the  reader  can  grasp  the  essential  subject 
matter  that  we  are  attempting  to  present.  In  the 
appendices,  we  present  many  of  the  details  of  the  system 
design,  implementation,  and  applications  that  are  likely  to 
be  of  interest  only  to  individuals  actively  engaged  in  the 
development  and  application  of  language  implementation 
systems.  Appendix  A  is  the  LIS  User  Reference  Manual,  which 
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describes  LIS  from  the  point  of  view  of  the  language 
designer/implementer .  In  Appendix  B,  we  give  a  macro 
description  of  the  design  and  implementation  of  LIS, 
complete  with  flowcharts  of  the  major  procedures  and 
descriptions  of  the  major  data  structures.  In  Appendix  C, 
we  give  an  application  of  LIS  to  the  implementation  of  a 
translator  from  a  block  structured  language  into  Dennis's 
Common  Base  Language  (a  particular  formal  semantic  system). 
In  Appendix  D,  we  give  the  syntax  of  PL/I  from  which  we 
generated  CLR(I)  lexical  and  primary  parsers.  In  Appendix 
E,  we  give  an  example  of  the  application  of  LIS  to  the 
development  of  an  Interactive  decision  support  system. 
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Chapter  II 


Fundamental -Language  Concepts 

and  the 

Structure  of  the  Language  Implementation  System 


1 1  .A  Introduction 

As  indicated  in  Chapter  I,  this  thesis  is  oriented 
towards  the  design  and  application  of  our  Language 
Implementation  System,  as  opposed  to  a  rigorous  development 
of  its  underlying  theory.  Even  with  this  design  and 
application  orientation,  however,  there  remains  a  moderate 
amount  of  terminology  and  fundamental  conceptual  material 
that  is  essential  to  our  subsequent  discussions.  In  Section 
II. B,  we  give  a  treatment  of  this  material  and  acknowledge 
that  most  of  this  treatment  is  from  DeRemer's  thesis  (DeR 
69). 

In  Section  II.C,  we  present  the  structure  of  the 
Language  Implementation  System,  and  discuss  its  utilization 
in  the  development  of  artificial  languages  and  their 
associated  processors.  In  addition,  we  describe  the  way  in 
which  the  fundamental  algorithms  presented  in  Chapter  III 
are  combined  to  achieve  the  functional  capability  of  LIS. 


-  31 


Preceding  page  blank 


V 


1 1 . b  piir>damantai_r:nnrfints  and  lermlnQlaa^ 

ne  begin  by  defining  terms  and  notation.  We  assume  the 
reader  is  familiar  with  the  properties  of  symbols*  StrlQOS 
of  symbols,  languages.  llQllfi  Slalfi  maChlQfiS  <FSMs),  iarmal 
grammars,  and  dPtarmlnlsllC  dOHO  flUlam&.ta  (DPDAs) . 

A  rnntPYt-free  arammax  (CFG)  is  a  quadruple  (Vt,  Vn,  S, 

P)  where  Vt  is  a  finite  set  of  symbols  called  lermlnala*  Vn 
is  a  finite  set  of  symbols  distinct  from  those  in  Vt  called 
nnn— tprminals.  S  is  a  distinguished  member  of  Vn  called  the 
starH no  symbol,  and  P  is  a  finite  set  of  pairs  called 
productions.  Each  production  is  written  A->w  and  has  a  1 all 
pati  A  in  Vn  and  a  riflbl  carl  "  in  V*  where  V  =  Vn  U  Vt.  V* 
denotes  the  set  of  all  strings  composed  of  symbols  in  V, 
including  the  empty  string. 

Without  loss  of  generality,  we  conventionalize  that  (i> 
the  productions  are  arbitrarily  numbered  from  0  to  s,  and 
(ii)  the  zero-th  production  is  of  the  form  S  -> 
where  S'  is  sort  of  a  subordinate  starting  symbol,  and  S  and 
the  “pad"  symbols  !-  and  -!  appear  in  none  of  the  other 
productions.  In  addition,  we  associate  the  symbol,  #p,  with 
the  p-th  production,  so  that  the  production  may  be 
conceptually  written  as  (p)  A->w  #p.  The  ^-symbols  are 

alternatively  called  aaaiit  symbols,  and  refer  to  the 
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application  of  the  associated  production 


The  following  is  an  example  of  a  context-free  grammar* 
G  =  <<(,), i,?,+t !-•-!>•  <S,E,T,P>,  S,  P) 
where  P  consists  of  the  following  productions* 


(0) 

S  -> 

5-  E  -! 

(4) 

r  ->  p 

( ] ) 

E  -> 

E  +  T 

(5) 

P  ->  i 

(2) 

(3) 

E  -> 
T  -> 

T 

P  T  T 

(6) 

p  ->( E ) 

If  A->w  is  a  production,  an  irnmfidiale  star lvatlSQ  of  one 
String  a  -  vqb  from  another  a'  -  vMb  is  written  a'->a.  rte 
say  that  a  is  immediately  derivable  from  a'  via  application 
of  the  production  M->q  to  a  particular  occurrence  of  M  in 
a'.  The  transitive  completion  of  this  relation  is  a 
Hpriv>Hnn  and  is  written  a'->*a,  which  means  there  exists 
strings  aO,al,...,an  such  that  a'  -  aO->al-> .. .->an  =  a  for 
n  2  0.  A  right  Hertvation.  written  a'->r*a,  is  one  in  which 

1or  i  -  . n  each  a(l)  is  immediately  derivable  from 

a(i-l)  via  application  of  the  rightmost  non-terminal  in 
a(i-l).  We  choose  the  right  derivation  as  our  caQQnisal 

riftrlvatlon* 

A  terminal  fitElQfl  is  one  consisting  entirely  of 
terminals.  A  Mnfntlfll  low  is  any  string  derivable  from 
S.  A  fence  is  any  terminal  sentential  form.  The  languaaa 
L(G)  generated  by  G  is  the  set  of  sentencesi  i.e.,  L(G) 
<n€Vt*  !  S->*n) .  A  right  SflOlAQllfll  12 JCD.  which  we  choose  as 


our  rannnical  lorn l  is  any  String  canonically  derivable  from 


S. 

Let  M->q  be  the  p-th  production  of  a  CFG,  and  let 
a-/=vMb  and  a-~vqb  be  canonical  forms  such  that  there  exists  a 
canonical  derivation  S->r*ay->a.  Then  vq#p  is 
r.haracterisllP  Siring  of  a* 


Loosely  speaking,  a  paras  of  a  string  is  some 
indication  of  how  that  string  was  derived.  In  particular,  a 
rannn i r a  1  eanaa  of  a  sentential  form,  a,  is  the  reverse  of 
the  sequence  of  productions  (or  equivalently,  the  numbers 
thereof)  used  in  a  canonical  derivation  of  a.  We  refer  to 
the  action  of  determining  a  parse  as  parsing,  the 
determination  constitutes  a  grammatical  analysis,  and  a 
parsing  algorithm  is  called  a  parSflE » 


For  our  purposes,  a  Einllfl  Stala  Machine  (FSM)  may  be 


considered  to  be  a  set  of  statftS  anc^ 


states.  Each 


has  an  associated 


from  those 
Ltion  symbol 


which  defines  an  exit  from  the  state  to  which  the 
transition  belongs.  The  symbol  may  be  a  terminal  symbol,  a 
non-terminal  symbol,  or  a  #-symbol.  In  the  first  two  cases, 
the  state  transition  also  has  an  associated  dftStinfltiQn 


state .  The  destination  Stfltfl  identifies  the  state  that  is 
entered  upon  exiting  the  original  state  on  the  transition 
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symbol.  A  state  having  only  transitions  on  terminal  and 
non-terminal  symbol.?  is  called  a  read  slalfi »  A  state  having 
a  single  transition,  that  transition  being  on  a  #-symbol,  is 
called  an  apply  state.  A  state  having  more  than  one 
transition,  at  least  one  of  which  is  on  a  #-symbol  is  called 
an  inadequate  state. 

A  series  of  transitions  leading  through  an  FSM  from 
state  Nl,  to  state  N2...  to  state  Nk  is  called  a  fiflih  from 
N I  to  Nk.  Every  such  path  soalls  fliit  a  unique  string  of 
input  symbols  (i.e.  an  in&ul  string)  in  the  obvious  way.  An 
FSM  arrwpts  a  given  string  T  if  end  only  if  there  exists  at 
least  one  path  that  begins  at  a  (specially  marked)  starting 
state,  spells  out  T,  and  ends  at  a  (specially  marked) 
terminal  state.  The  set  of  all  strings  accepted  by  an  FSM 
is  referred  to  as  the  set  that  is  racoaniZfld  by  that  FSM. 

States  M  and  N  are  said  to  be  equivalent  if  and  only  if 
for  each  input  string  T  spelled  out  by  some  path  from  M  (N), 
such  that  the  path  also  spells  out  the  string  T',  there 
exists  a  path  from  N  (M)  which  spells  out  the  same  two 
strings  T  and  T',  respectively. 

An  FSM  is  said  to  be  deterministic  if  and  only  if  it 
has  a  single  starting  state  and  from  each  state  there  is  at 
most  one  transition  under  each  distinct  input  symbol* 
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otherwise,  it  1=  said  to  be  nnnde  terming.  a 
deterministic  FSM  Is  said  to  be  If  and  only  If  every 

state  is  accessible  from  the  starting  state,  some  terminal 
State  is  accessible  from  every  state,  and  no  two  states  are 

equivalent • 

A  Characteristic  Finite  State  Machine  CCFSM)  of  a 
context-free  grammar  0  is  a  reduced,  deterministic  FSM  which 
recognizes  the  set  of  characteristic  strings  of  G. 

«e  often  thin*  of  a  d.t.rminlstic  FSM  as  a  physical 

machine,  rather  than  as  an  abstract  model,  and  this  leads  t 

the  following  terminology.  To  determine  If  a  given 
accepts  a  given  string  T,  we  say  we  inltlnllZfi  the  machine 
U.e.  Start  It  In  Its  starting  state),  8001*  to  T,  and 

determine  If  T  UkU  msW  tbrflltfh  »  sequence  °' 

states  to  a  terminal  state.  The  machine  is  said  to  tfiad 

the  symbols  in  T  from  an  InaUl  tape,  to  anlfll  first  one 

state  and  then  the  next,  and  to  OUlBUt  symbols  onto  an 

o„tn„t  ».  If  after  reading  the  last  symbol  of  T  the 
machine  outputs  a  than  It  accepts  T.  However.  If  at 

that  time  It  outputs  a  -0-,  It  does  not  accept  T.  The 
machine  stops  reading  whenever  it  enters  a  state  with 


transition  under  the  next  symbol  to  be  read. 
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A  Deterministic  Push  Down  Automaton  (DPDA)  is  a  machine 
that  consists  of  an  Input  an  output  iaufi,  a  ilnlig 
control  and  a  push  down  stack. 

The  finite  control  can  be  thought  of  as  a  proaram 
consisting  of  Instructions  pertaining  to  the  reading  of 
symbols  from  the  input  tape  and  the  outputting  of  symbols  to 
the  output  tape*  the  storage,  interrogation,  and  removal  of 
Items  on  the  stack,  and  Jumps  from  one  point  in  the  program 
to  another.  The  control  can  be  represented  by  a  transition 
graph  whose  nodes  are  called  states ,  and  whose  labelled 
arrows  are  called  transitions. 

Each  state  represents  a  point  in  the  program  which  can 
be  Jumped  to,  and  it  has  a  name  which  is  given  inside  the 
node.  There  Is  a  unique  starting  state  and  a  unique 
terminal  state. 

Each  transition  implies  one  of  four  kinds  of 
Instructions,  the  Interpretations  of  which  are  Indicated 
next.  If  the  machine  enters  state  N  having  a  transition  to 
state  M,  then,  if  the  label  of  the  transition  is  (1)  a 
symbol  s,  the  machine  reads  the  next  symbol,  and  if  the 
symbol  read  is  s,  it  then  enters  state  M,  (2)  "push  i",  the 
machine  pushes  the  item  1  on  the  stack  and  then  enters  state 
M,  (3)  "pop  n,  out  p",  the  machine  pops  the  top  n  items  off 
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the  stack,  outputs  p,  and  then  enters  state  M,  or  (4)  "top 
i",  the  machine  compares  item  i  with  the  top  item  on  the 
stack,  and  if  they  are  the  same,  it  enters  state  M. 

The  following  two  conditions  are  sufficient  to 
guarantee  determinism*  (1)  any  state  having  a  transition 
under  either  "push  i"  or  "pop  n,  out  p"  may  have  no  other 
transitions,  and  (2)  any  other  state  must  have  either  every 
transition  under  a  symbol,  or  every  one  under  "top  i"  for 
some  item  i. 

The  initial  configuration  of  a  DPDA  is  as  follows*  It 
is  started  in  its  starting  state  with  the  input  string  on 
its  input  tape,  with  its  input  head  (reading  device)  over 
the  leftmost  symbol  of  the  input  string,  and  with  its  stack 
empty.  The  final  configuration  is*  the  input  head  one  place 
to  the  right  of  the  rightmost  symbol  of  the  input  string, 
the  stack  empty,  and  the  machine  in  its  terminal  state. 

The  similarity  of  DPDAs  and  FSMs  is  emphasized  if  we 
note  that  a  DPDA  which  never  uses  its  stack  is  equivalent  to 
some  FSM.  This  leads  us  to  think  of  a  DPDA,  then,  as  being 
based  on  some  FSM .  We  think  of  this  F3M  as  reading  symbols, 
as  usual,  but  interspersed  between  some  of  the  reads  are 
some  "bookkeeping"  operations  involving  the  stack,  and  these 
operations  effect  some  of  the  state  changes  of  the  FSM. 
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As  we  acknowledged,  most  of  the  above  discussion  is 
from  DeRemer.  We  now  relate  some  of  the  above  concepts  and 
terminology  to  their  particular  role  in  the  development  of 
the  fundamental  algorithms  of  LIS. 

LIS  accepts  the  Backus  Naur  Form  (BNF)  as  a  syntactic 
meta-language  for  representing  context-free  grammars.  A  BNF 
specification  is  composed  of  BNF  mLaa  of  the  form 
A  **=  B  ! ,  where  A  is  the  left  part  of  the  rule,  and  B  Is 
the  right,  fifiJCl  Of  the  rule.  The  left  part  of  the  rule  is 
the  non-terminal  that  is  defined  in  the  rule.  The  right 
part  of  the  rule  Is  composed  of  one  or  more  alternative 
definitions  of  the  left  part,  separated  by  the  symbol  "!". 
The  alternatives  of  the  right  part,  in  conjunction  with  the 
non-terminal  left  part,  correspond  to  the  orodurt Ions  of  the 
context-free  grammar.  Thus,  a  BNF  rule  with  n  alternative 
right  parts  corresponds  to  n  productions,  and  we  often  use 
the  term  alternative  and  production  interchangeably.  In  a 
BNF  specification  containing  n  rules,  the  rules  are 
arbitrarily  numbered  from  I  to  n.  In  order  to  uniquely 
identify  the  alternatives  (productions)  of  the 
specification,  we  also  number  the  alternatives  of  each  rule 
from  one  to  the  number  of  alternatives  in  the  rule.  Thus, 
the  J-th  alternative  of  the  i-th  rule  is  uniquely  identified 
as  (it  j) . 
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The  non-terminal  and  terminal  symbol*:  of  a  BNF 


specification  are  represented  in  the  following  way* 

1.  The  non-terminal  symbols  are  represented  as 
character  strings  enclosed  in  angle  brackets 
(e.g.  character  string  ">"). 

2.  The  terminal  symbols  are  all  character  strings 
of  the  specification  excluding  non-terminals 
and  the  character  strings  "***“,  "!", 

«<»',  ">",  blank,  tab,  new-line,  and  new-page, 
unless  these  character  strings  are  "escaped" 

(see  Appendix  A). 

The  goal  symbol  of  the  BNF  specification  is  either 
<pr imary_non_terminal >  or  <lexical_non_terminal> ,  depending 
on  whether  the  specified  grammar  is  primary  or  lexical, 
respectively.  The  lexical  grammar  defines  the  structure  of 
the  basic  elements  of  the  language  (e.g.  <identif ier>, 
<integer>)  in  terms  of  the  legal  terminal  characters  of  the 
language.  The  primary  grammar  defines  the  sentential  forms 
of  the  language  in  terms  of  these  basic  lexical  constructs. 
In  those  language  definitions  in  which  both  lexical  and 
primary  grammars  are  specified,  both  <primary_non-.terminal> 
and  <lexical_non_terminal >  must  be  defined.  A  more 
extensive  discussion  of  the  structure  and  format  of  the  LIS 
Backus  Naur  Form  is  given  in  Appendix  A. 

The  previously  defined  context-free  grammar,  G,  can  be 
expressed  in  BNF  as  follows* 
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U)  <primary_non_terminal>  ,,=  •“  <^>  ~ •  '• 

(2)  <E>  **  =  <E>  +  <T>  ! 

<T>  ! 

(3J  <T>  it*  <P>  ?  <T>  5 

<P>  ! 

(4)  <p>  »»»  1  J  (  <E>  )  ! 

In  the  previous  discussion  on  context-free  grammars  the 
term  XAQSUflflfi  was  formally  defined*  While  we  will  maintain 
this  formal  definition,  we  will  aiso  use  the  term  (as  well 
as  the  term  artificial  lanouaga)  in  an  informal  sense  in 
which  we  include  the  semantics  associated  with  the  syntactic 
constructs.  Thus,  we  refer  to  the  language  (syntax  and 
semantics)  PL/I,  the  language  Algol,  and  languages  that 
constitute  the  logical  interface  between  the  end-user  and  an 
interactive  computational  facility,  l.e.  interactive 
languages* 

The  term  parser  will  be  retained  as  previously  defined, 
and  we  shall  differentiate  between  lexical  parsers  and 
primary  parsers  where  such  differentiation  is  appropriate. 
We  also  make  occasional  reference  to  the  term  rftCQflnlZfiL* 
which  for  our  purposes  is  synonymous  with  the  term  parser. 
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The  fundamental  structure  of  the  Language 
Implementation  System  is  indicated  in  Fiqure  II. i.  Language 
development  resolves  into  two  interacting  phases.  Processor 
Generation  and  Processor  Execution. 


II.C.1  Pcocflssac.,OfiQerflt  Ion 


Processor  Generation  consists  of  the  execution  of  the 
LIS  Pre-Processor  and  the  LIS  CLR(k)  Generator  for  purposes 
of  computing  the  following  functional  results  from  the 
submitted  LIS  Language  Definition* 

a.  The  DPDAs  which  are  used  to  "drive41  LIS 
Processor  Control  In  parsing  legal  Input  Text 
of  the  language.  Note  that  our  use  of  the 
term,  DPDA,  Is  slightly  different  than  zs 
defined  In  Section  II. B.  Our  present  use  of 
the  term  includes  only  the  finite  control. 

The  Input  tape,  output  tape,  and  push  down 
stack  are  incorporated  into  LIS  Processor 
Control . 

b.  A  PL/I  procedure  which  represents  the  semantic 
interpretation  to  be  associated  with  ths 
language's  syntactic  constructs. 


A  precise  specification  of  the  format  of  an  LIS 
Language  Definition  may  be  found  In  Appendix  A.  For  our 
present  purposes,  however,  we  may  consider  an  LIS  Language 
Definition  to  consist  of* 


of  th«  Unw...  lmplwnt.tlon  Sy.tg; 
Figure  II*  1* 
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a.  A  Backus  Naur  Form  specification  of  the 
syntax  of  the  language  being  defined. 

b.  A  PL/I  specification  of  the  semantics  of  the 

language,  expressed  in-line  with  the  BNF 
specification  on  a  per-BNF  rule  basis.  In 
specifying  the  semantics  of  a  particular 
syntactic  construct,  the  language 

designer/implementer  uses  PL/I  to  define  the 
actions  that  his  language  processor  is  to 
perform  when  the  corresponding  syntactic 
construct  is  recognized. 


The  LIS  Pre-Processor  performs  the  following  functions* 

a.  The  LIS  Pre-Processor  computes  the  Semantic 
Source  Segment  from  the  Language  Definition. 

b.  The  LIS  Pre-Processor  performs  various 
validity  checks  and  analysis  procedures  on  the 
submitted  grammar,  delivering  diagnostics  for 
those  checks  and  analysis  procedures  that  the 
grammar  fails  to  satisfy.  Certain  of  the 
checks  and  procedures  are  of  a  warning  nature 
only!  failing  to  satisfy  these  will  not 
prevent  the  activation  of  the  LIS  CLR(k) 
Generator.  Others  are  of  a  fatal  nature,  and 
must  be  satisfied  if  the  LIS  CLR(k)  Generator 
is  to  be  activated.  The  checks  and  analysis 
procedures  that  have  been  implemented  on  LIS 
are  discussed  from  the  language  design 
viewpoint  in  Appendix  A,  and  from  the  LIS 
system  design  viewpoint  in  Appendix  B. 


LiS,CLRUI-GftQeialaj: 

If  the  LIS  Pre-Processor  encounters  no  fatal  errors  in 
the  LIS  Language  Definition,  control  is  automatically  passed 
to  the  LIS  CLR(k)  Generator.  This  phase  of  the  system 
attempts  to  compute  a  CLR(k)  parser  for  k  less  than  or  equal 
to  a  certain  internally  set  value  (currently  set  at  3).  The 
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CLR(k)  (Comprehensive  Left  to  Right,  looking  ahead  a  maximum 
of  It  symbols)  grammars  constitute  a  large  subset  of  the 
LR ( k )  g-ammars,  which  in  turn  possess  the  following 
characteristics  * 

a.  The  LR(k)  condition  generates  exactly  the 
deterministic  context-free  grammars. 

b.  The  LR(k)  grammars  represent  the  largest  class 
of  grammars  known  to  be  parsable  in  linear 
time  (proportional  to  the  length  of  the  'nput 
text)  during  a  sinoie  left  to  right  scan. 

c.  A  grammar  satisfying  the  LR(k)  condition  is 
unambiguous. 

Intuitively,  the  LR(k)  condition  implies  that  the 
identity  of  a  particular  syntactic  construct  may  be 
ascertained  by  looking  indefinitely  far  to  the  left  and  at 
most  k  symbols  to  the  right  of  the  current  position  in  the 
parse  (symbols  meaning  characters  or  lexical  constructs, 
depending  on  whether  a  lexical  or  primary  grammar  is  being 
defined,  respectively).  This  is  an  extremely  comprehensive 
condition,  and  covers  virtually  all  artificial  languages 
that  are  likely  to  be  of  "practical"  interest. 

In  attempting  to  compute  the  parsers,  the  LIS  CLR(k) 
Generator  delivers  diagnostics  for  those  areas  of  the 
language  that  do  not  satisfy  the  CLR(k)  condition.  These 
diagnostics  include  sufficient  information  on  the 
language's  local  ambiguities  to  enable  the  language 
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designer/implementer  to  modify  the  syntax  of  his  language 
in  order  to  make  it  CLR(k).  Assuming  that  the  grammar  is 
CLR(k),  the  functional  output  of  the  LIS  CLR(k)  Generator  is 
a  segment  containing  one  or  two  DPDAs,  depending  on  whether 
a  lexical  parser,  a  primary  parser,  or  both,  are  computed. 
The  DPDAs,  in  combination  with  LIS  Processor  Control, 
constitute  the  parsers  for  the  processor  of  the  language 
being  defined. 


As  a  preview  to  the  treatment  in  Chapter  III,  we  note 
that  the  following  algorithms  are  sequentially  invoked  by 
LIS  in  producing  a  DPDA  from  a  given  grammar* 

1 .  Compute  CFSM 

This  algorithm  is  invoked  to  compute  the 
Characteristic  Finite  State  Machine  of  the 
submitted  grammar. 

2.  Convert  CFSM  lO-DRDA 

This  algorithm  is  invoked  to  convert  the  CFSM 
into  a  Deterministic  Push  Down  Automaton.  In 
converting  each  inadequate  state  of  the  CFSM, 
the  algorithm  invokes  the  CLR(k.)  look-ahead 
algorithm  to  associate  a  set  of  look-ahead 
transition  strings  with  each  of  that  state's 
generated  DPDA  states. 

3.  Optimize  DPDA 

This  algorithm  is  invoked  to  perform 
optimizations  on  the  DPDA  that  will  result  in 
enhancements  to  the  space  and  time  efficiency 
of  the  resulting  parser. 


ii.c.2  ErQ££SSQE-.Exa£Utlon 


A  processor  for  the  artificial  language  specified  by 
the  LIS  Language  Definition  is  synthesized  by  combining  the 
DPDAs  and  the  Semantic  Object  Segment  with  LIS  Procassor 
Control . 

LIS  Processor  Control  coordinates  the  overall 
language  processing  activity.  In  parsing  the  Input  Text, 
it  is  "driven-"  by  the  DPDAs,  and  upon  recognition  of  a 
particular  syntactic  construct,  it  activates  the  semantics 
associated  with  that  construct.  It  is  the  responsibility  of 
the  activated  semantics  subsequently  to  return  control  to 
LIS  Processor  Control  so  that  language  processing  may 
continue . 

The  semantics  can  access  the  Input  Text  directly,  and 
the  normal  situation  is  for  LIS  Processor  Control  to 
coordinate  these  accesses  by  directing  the  semantics  to 
specific  text  such  as  identifiers,  keywords,  symbol  tables, 
etc.  As  indicated  In  Figure  II.  I,  there  is  no  explicit 
output  from  Processor  Execution.  It  is  therefore  the 
responsibility  of  the  semantics  to  manage  its  own  output,  as 
well  as  its  alternate  input  files,  temporary  files,  symbol 
tables,  etc. 
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Chapter  III 


Th<»  Fundamental  Algorithms 
of  the 

Language  Implementation  .S.yslam 

1 1 1. A  Introduction 

In  this  chapter,  we  present  the  fundamental  algorithms 
of  the  Language  Implementation  System.  Being  an  algorithmic 
presentation,  we  will  avoid  much  of  the  design  and 
implementation  detail.  The  reader  interested  in  such  detail 
Is  referred  to  Appendix  B. 

III.B  An  Example  Grammar. 

Our  approach  In  presenting  the  algorithms  of  LIS  is  to 
consider  It  our  task  to  provide  a  processor  (primary  parser 
plus  semantic  controller)  for  a  particular  language,  and  to 
follow  the  language  through  the  algorithms  of  Processor 
Generation  that  will  compute  from  its  primary  grammar  the 
required  CLR( ))  DPDA.  We  then  present  the  CLR( 1 )  parsing 
algorithm  and  thereby  synthesize  our  language  processor. 
The  grammar  Jor  which  we  will  produce  a  primary  parser  is  as 
follows* 
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Preceding  page  blank 


<lexical_non_terminal>  **= 

<identifier>  ! 

<integer>  ! 

<identifier>  **  = 

<identifier>  a->z  ! 

<identifier>  0->9  ! 

<identifier>  _  1 

a->z  ! 

<integer>  **= 

<integer>  0->9  ! 

0->9  * 

<primary_„non_terminal>  i**= 

<a ssignment_sta:ement>  ! 

<assignment_statement>  «** 

<ldentifier>  =  <expression>  t  1 

<expression>  **  = 

<expression>  +  <term>  ! 

<term>  ! 

<term>  i*  = 

<term>  *  <factor>  J 
<factor>  ! 

<factor>  **= 

<identifier>  ! 

<integer>  ! 


This  grammar  defines  a  simple  assignment  statement 
language,  the  goal  symbol  of  which  is 
<primary_non_terminal>.  It  also  defines 
<lexical_non_terminal>,  and  this  definition  serves  to 
establish  the  “division  of  labor"  between  the  constructs 
to  be  recognized  by  the  primary  parser,  and  those  to  be 
recognized  by  the  lexical  parser.  The  convention  on  LIS  is 
that,  with  respect  to  the  primary  grammar,  those 
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non-terminals  defined  to  be  <lexical_non_terminal>s  may  be 
treated  in  the  same  way  as  terminal  symbols.  This  point 
should  be  kept  in  mind  during  the  subsequent  discussions 
when  we  refer  to  terminal  symbols  and  transitions  on 
terminal  svmbols  in  the  CFSM  and  DPDA. 

At  the  end  of  this  chapter,  we  demonstrate  the  CLR(l) 
parsing  algorithm  on  sample  text  of  our  example  grammar,  and 
for  this  reason  we  will  invoke  LIS  to  compute  both  a  lexical 
and  a  primary  parser.  However,  our  discussions  on  the 
fundamental  algorithms  will  involve  only  the  primary 


grammar 


III.C  The  Algorithm  J[Qr_ComDUtina  thfl 

Chararter istic  Finite  State  .Machine 

The  algorithm  that  we  use  for  computing  the 
Characteristic  Finite  State  Machine  (CFSM)  is  that  of 
Knuth-Earlev.  The  CFSM  is  computed  by  iteratively 
generating  unique  configurations  of  the  grammar  and 
associating  a  CFSM  state  with  each  such  configuration.  A 
configuration  may  be  thought  of  as  a  state  of  the  grammar  in 
which  productions  (alternatives)  belonging  to  the  state  have 
exactly  one  of  their  symbols  marked .  The  marked  symbol  of  a 
production  may  be  a  non-terminal  symbol,  a  terminal  symbol 
or  the  production's  #-symbol  (application  of  alternative). 
Each  marked  production  symbol  of  a  configuration  is 
associated  with  exactly  one  of  the  transitions  of  the 
corresponding  CFSM  state.  It  may  be  that  several  marked 
symbols  correspond  to  the  same  transition. 

In  the  subsequent  discussion,  we  refer  to  the  process 
of  completing  a  configuration,  by  which  we  mean  repeatedly 
scanning  the  particular  configuration  and  adding  to  it  new 
marked  productions.  One  necessary  condition  for  adding  a 
marked  production  to  a  configuration  is  that  there  already 
exist  a  marked  production  whose  marked  symbol  is  a 
non-terminal.  This  being  the  case,  we  consider  as 
candidates  for  marking  and  addition  to  the  configuration. 


the  leftmost  symbols  of  all  productions  defining  that 
non-terminal.  An  additional  necessary  condition  for'  adding 
a  marked  production  to  a  configuration  is  that  the  candicate 
(in  the  above  sense)  for  addition  must  not  already  be  a 
member  of  the  configuration.  Taken  together*  these 
conditions  are  both  necessary  and  sufficient  for  adding 
marked  productions  to  a  configuration.  The  configuration  is 
complete  when  a  scan  of  the  configuration  does  not  result  in 
the  addition  of  new  marked  productions. 

Since  it  is  our  task  to  produce  a  parser  for  our 
primary  grammar,  we  know  that  we  must  be  able  to  recognize 
an  <assignment_statement>.  We  therefore  initialize 
configuration-)  with  the  leftmost  symbol  of  the  production 
defining  <pr imary_non_terminal>  as  follows  (we  indicate 
marked  symbols  with  underlining)* 

<primary_non_terminal>  **» 

<asslgnment  statements  ! 

However,  in  the  process  of  recognizing  an 
<assignment_statement>,  we  must  necessarily  recognize  an 
<identifier>  (by  definition  of  <assigr,ment_statement> ) ,  so 
that  we  add  to  configuration-)  the  following  marked 
production* 

<assignment_statement>  *»* 

< identifier*  =  <expression>  »  i 
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Since  <identifier>  is  a  <lexical_non_terminal>,  we 
treat  it  as  a  terminal  symbol,  and  therefore  cor.f iguration-l 
is  complete  as  follows* 

Conf i guratlon-1 

<primarv_non_terminal >  *** 

<ass ionment  statements  ! 

<assignment_statement>  ««-= 

< identifier*  =  <expression>  S  J 

This  configuration  corresponds  to  our  first  CFSM  state, 
which  is  automatically  accessed  when  the  parser  is 
initiated.  As  we  previously  indicated,  each  marked 
production  of  a  configuration  corresponds  to  exactly  one 
transition  of  the  associated  CFSM  state.  However,  we  do  not 
yet  know  the  destinations  of  the  state  transitions,  so  that 
state-1  is  initially* 

State*  1  Accessed  by* 

<assignment_statement>  go  to  ? 

<identifier>  go  to  ? 

In  order  to  determine  the  destination  states  of  the 
transitions  in  state-1,  it  is  necessary  to  go  back  to 
conf lgurat ion— I  and  consider  the  two  configurations  that 
would  result- by  completing  the  initial  configurations  formed 
by  advancing  the  marked  symbols  of  configuration-1  one 
position  to  the  right.  Advancing  the  symbol  marker 
corresponds  to  ureadingM  the  symbol  previously  marked,  which 
is  exactly  what  we  want.  In  the  case  of  the  first  marked 
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production  of  configuration-1,  advancing  the  marker  by  one 
position  results  in  the  following  initial  configuration* 

<pr lmary_non_terminal>  **  = 

<assignment_sta tement>  1 

This  newly  marked  symbol  corresponds  to  the  application 
of  the  production.  Since  the  initial  configuration  has  no 
marked  non-terminals,  the  configuration  is  complete.  During 
the  process  of  CFSM  computation,  each  configuration  that  we 
complete  is  considered  temporary  until  it  is  determined 
whether  the  configuration  has  been  previously  generated.  If 
the  configuration  has  been  previously  generated,  then  the 
references  that  would  be  made  to  the  temporary  configuration 
will  instead  be  made  to  the  previously  generated 
configuration,  and  the  temporary  configuration  will  be 
discarded.  If  the  temporary  configuration  has  not  been 
previously  generated,  then  it  is  retained.  A  scan  of  the 
configurations  thus  far  generated  (i.e.  configuration-1) 
reveals  no  match  with  the  temporary  configuration,  so  that 
it  is  retained  as  configuration-2* 

<pr imary_non_terminal >  **  = 

<assignment_statem?nt>  I 

Thus,  state-2  becomes  the  destination  state  of  the 
first  transition  of  state-1. 
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In  determining  the  destination  state  of  the  second 
transition  of  state-1,  we  advance  the  marker  on  all  marked 
productions  of  configuration-1  in  which  the  marked  symbol  is 
<identif ier>.  Doing  so  produces  the  following  initial 
configuration* 

<assignment_statement>  **  = 

<identifier>  =  <expression>  t  ! 

Since  the  configuration  does  not  contain  a  marked 
non-terminal,  it  is  complete.  A  scan  of  the  configurations 
thus  far  generated  (configuration-1  and  conf igura  :ion-2) 
reveals  no  match  ith  this  temporary  configuration,  so  that 
it  is  retained  as  configuration-3* 

Conf igura t ion-3 

<assignment_statement>  **  = 

<identifler>  *  <expression>  t  ! 

Thus,  state-3  becomes  the  destination  state  of  the 
second  transition  of  state-1. 

CFSM  state-1  is  now  computed* 

State  I  *  Accessed  by* 

<assignment_statement>  go  to  2 

<identifier>  go  to  3 

Turning  now  to  state-2,  we  see  that  it  is  accessed  by 
<assignment_statement>,  and  its  corresponding  configuration 
(configuration-2)  consists  of  a  single  marked  production. 
The  marked  symbol  of  the  production  is  the  application  of 
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alternative  (4j  1).  Since  it  is  the  application  symbol  that 
is  marked*  the  marker  cannot  be  further  advanced  within  the 
production,  and  the  corresponding  state  transition  has  no 
destination  state. 


CFSM  state-2  is  now  computed* 

State*  2  Accessed  by*  <assignment_statement> 

Apply  (4*  I) 

Turning  to  state-3,  we  see  that  it  is  accessed  by 
<identif ier> ,  and  that  its  corresponding  configuration 
(configuration-3)  consists  of  a  single  marked  production. 
The  symbol  marked  in  configuration-3  is  so  that  state-3 

is,  initially* 

State*  3  Accessed  by*  <identifier> 

=  go  to  ? 


In  determining  the  destination  state  of  the  transition 
of  state-3,  we  advance  the  marker  in  the  marked  production 
associated  with  the  transition  one  position  to  the  right. 
This  yields  the  following  Initial  configuration* 


<assignment_statement>  *** 

<  identif ier> 


i>  l  I 


Completing  this  configuration  results 
configuration  matching  none  of  the  previously 
configurations,  so  that  the  new  configuration  is 
configuration-4* 


in  a 
exist ing 
retained 
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Conllaurallaiiri1 

< a ss i gnme n t_s ta t ement >  it  — 

355  9  < Identifier  =  <ftxpresslao> 


<expression>  **  = 


<term>  **- 


<factor>  «*  = 


<  gvnress  lon£  +  <term>  1 

<  term*  ! 

<term>  *  <factor>  i 
< far. tor >  ! 

1  *denti  f  ler_>  ! 

<lntQgan>  ! 


j 


CFSM  state-3  is  now  computedi 

State:  3  Accessed  byt  <identifier> 
s  go  to  4 


Turning  now  to  state-4,  we  see  that  it  is  accessed  by 
and  that  its  corresponding  configuration  consists  of  7 
marked  productions.  In  determining  the  destination  state  of 
the  first  transition  of  state-4,  we  advance  the  marker  on 
all  marked  productions  of  configuration-4  in  which  the 
marked  symbol  is  <expression>.  This  yields  an  initial 
configuration  which  has  no  marked  non-terminals,  and  which 
is  therefore  complete.  The  configuration  matches  none  of 
the  previously  existing  configurations,  and  is  therefore 
retained  as  conf iguration-5» 


<assignment_statement>  «»  1 

<identifier> 

<expression>  *** 

<expression> 


<expression> 
♦  <term>  ! 


1 


i 
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Thus,  state-5  becomes  the  destination  state  of  the 
first  transition  of  state-4. 

In  determining  the  destination  state  for  the  second 
transition  of  state-4,  we  advance  the  marker  on  all 
productions  in  which  the  marked  symbol  is  <term>.  This 
yields  an  initial  configuration  which  has  no  marked 
non-terminals,  and  which  is  therefore  complete.  The 
configuration  matches  none  of  the  previously  existing 
configurations,  and  is  therefore  retained  as 
configuration-6* 

Conflqurai:ian=6 

<term>  *  «  = 

<term>  ★  <factor>  ! 

<term>  i 

Thus,  state-6  becomes  the  destination  state  of  the 
second  transition  of  state-4. 

In  determining  the  destination  state  of  the  third 
transition  of  state-4,  we  advance  the  marker  on  all  marked 
productions  of  configuration-4  in  which  the  marked  symbol  is 
<factor>.  This  yields  an  Initial  configuration  which  has  no 
marked  non-terminals,  and  which  is  therefore  complete.  The 
configuration  matches  none  of  the  previously  existing 
configurations,  and  is  therefore  retained  as 


configuration-7* 


Configuration-! 

<term>  **  = 

<factor>  i 

ihus,  state-7  becomes  the  destination  state  of  the 
third  transition  of  state-4. 

In  determining  the  destination  state  of  the  fourth 
transition  of  state-4,  we  advance  the  marker  on  all  marked 
productions  of  configuration-4  in  which  the  marked  symbol  is 
<ident if ier> .  This  yields  an  initial  configuration  which 
has  no  marked  non-terminals,  and  which  is  therefore 
complete.  The  configuration  matches  none  of  the  previously 
existing  configurations,  and  is  therefore  retained  as 

configuration-8* 

Conf i our at lon-8 

<factor>  **  = 

<identifier>  1 

In  determining  the  destination  state  of  the  fifth 
transition  of  state-4,  we  move  the  marker  on  all  marked 
productions  of  configuration-4  in  which  the  marked  symbol  is 
<integer>.  This  yields  an  initial  configuration  which  has 
no  marked  non-terminals,  and  which  is  therefore  complete. 
The  configuration  matches  none  of  the  previously  existing 
configurations,  and  is  therefore  retained  as 
configuration-9* 
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Cnnf  <  guratlon-9 
<factor>  **  = 


<integer>  1 


Thus,  state-9  becomes  the  destination  state  of  the 
fifth  transition  of  state-4. 


CFSM  state-4  is  now  computed* 

State*  4  Accessed  by*  = 
<expression> 
<term> 

<f actor> 
<identif ier> 

< integer> 


go  to  5 
go  to  6 
go  to  7 
go  to  8 
go  to  9 


tie  continue  the  state  generation  process  until  all  CFSM 
states  are  computed.  The  process  terminates  when  there  are 
no  more  configurations  to  oe  processed  into  states.  The 
completely  computed  CFSM  is  given  at  the  end  of  this 
chapter.  In  addition  to  the  Information  which  we  have  been 
computing,  the  CFSM  identifies,  for  each  CFSM  state,  the 
type  of  state,  the  corresponding  DPDA  state,  and  in  the  case 
of  an  inadequate  state,  the  DPDA  apply  state  that 
corresponds  to  each  apply  transition  of  the  state.  For  the 
apply  transitions,  the  number  of  symbols  in  the  alternative 
being  applied  is  given,  as  is  the  non-terminal  defined  by 
the  alternative. 


The  CFSM  computation  algorithm  illustrated  above  may  be 
formally  specified  as  follows* 
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I.  Set  the  number  of  configurations  generated, 
n_conf igurations,  equal  to  one. 

Set  the  number  of  CFSM  states  computed, 
n_cfsm_states,  equal  to  one. 

Initialize  configuration- I  with  the  leftmost 
symbol  of  all  productions  (alternatives) 
defining  the  grammar  goal  symbol 
( <pr imary_non_terminal>  or 

<lexical_non_termlnal>,  as  appropriate). 
Complete  configuration-1. 

Go  to  step  2. 

2*  If  n_cf sm_states  is  greater  than 

n_conf igurat ions ,  then  the  CFSM  is  generated 
and  the  algorithm  terminates. 

Otherwise,  go  to  step  3. 

3.  Advance  to  the  next  marked  symbol  of  the 
(n_cfsm_states)-th  configuration  that  has  not 
yet  been  converted  into  a  state  transition  of 
the  (n_cfsm_states)-th  state. 

If  none  remain,  then  add  one  to  n  cfsm_states 
and  go  to  step  2. 

Otherwise,  go  to  step  4. 

4.  The  symbol  detected  in  step  3  is  the 

transition  symbol  for  a  new  transition  of  the 
^ n— c f sm_ sta te s ) — th  state.  The  new  symbol  may 
be  a  terminal  symbol,  a  non-terminal  symbol, 
or  a  #-symbol  (application  of  production). 

Scan  the  rest  of  the  (n_cfsm_ states)-th 
configuration  looking  for  all  marked 
productions  whose  marked  symbol  is  the  same  as 
the  new  transition  symbol.  These  marked 
productions,  as  well  as  the  one  detected  in 
step  3,  should  be  identified  so  that  they  will 
not  be  considered  in  subsequent  iterations  of 
step  3  for  this  state. 

Go  to  step  5. 

5.  If  the  new  transition  symbol  is  a  #-symbol, 
then  the  transition  is  determined,  so  go  to 
step  3. 

Otherwise,  go  to  step  6. 

6.  In  order  to  determine  the  destination  state  of 

the  new  transition,  initialize  the 
(n_conf igurations  +  I ) -th  configuration  with 
the  marked  productions  of  the 
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(n_cfsm_states)-th  configuration  which 
satisfied  the  conditions  in  steps  3  and  4. 
Advance  the  symbol  markers  of  the  marked 
productions  one  position  to  the  right. 

Go  to  step  7. 

7.  Scan  the  n_conf igurations  configurations 
looking  for  a  match  with  the 
(n_conf igurations  +  1 )— th  configuration.  If  a 
match  Is  found,  then* 

Set  the  destination  state  of  the  new 
transition  to  the  number  of  the 
configuration  matched. 

Destroy  the  (n_conf igurations  +  I )-th 
configuration. 

Go  to  step  3. 

If  a  match  is  not  found,  them 

Set  the  destination  state  of  the  new 
transition  to  n_conf igurations  +  I . 

Retain  the  ( n_conf igurat Ions  +  I )— th 
configuration. 

Add  one  to  n_conf igurations . 

Go  to  step  3. 


In  the  implementation  of  the  CFSM  computation 
algorithm,  an  efficient  computational  notation  has  been 
adopted  for  representing  configurations.  The  notation 
involves  associating  a  position  In  a  bit  vector  with  each 
symbol  of  each  production  (alternative)  of  the  grammar, 
including  #-symbols.  Details  of  the  representation  of 
configurations  and  of  the  CFSM  computation  algorithm  may  be 
found  in  Appendix  B. 
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III.D 


The  Algorithm  f or .Converting  ±he  CF.SM..  intQ_a 


Our  approach  in  presenting  the  algorithm  for  converting 
the  Characteristic  Finite  State  Machine  (CFSM)  into  a 
Deterministic  Push  Down  Automaton  (DPDA)  is  first  to  present 
the  LR(O)  CFSM  parser.  The  inefficiencies  and  limitations 
inherent  in  the  LR(O)  parser  will  motivate  first  the  stack 
algorithm  and  then  the  algorithm  for  converting  the  CFSM 
into  a  DPDA.  We  then  present  the  conversion  algorithm  and 
apply  it  to  the  CFSM  computed  in  the  last  section. 

III.D. I  The  LR(O)  CFSM  Parser 


A  grammar  whose  CFSM  contains  no  inadequate  states  is 
LR(O) .  We  now  develop  the  CFSM  parser  for  such  a  grammar 
and  examine  its  limitations  and  inefficiencies.  In  the 
following  discussion,  we  assume  that  we  are  to  parse  a 
sentence,  T,  of  an  LR(O)  gremmar,  G.  The  parser  will  parse 
all  canonical  forms,  CF,  of  G,  and  we  initialize  the 
canonical  form  with  T,  that  is  CF  ■  T. 

1.  Initialize  the  CFSM  in  state-1,  and  apply  it 
to  the  canonical  form,  CF. 

The  parser  will  take  transitions  in  the  CFSM 
that  correspond  to  the  symbols  of  CF  until  an 
apply  state,  A,  is  reached. 

Go  to  step  2. 

2.  The  production  to  be  applied  in  state-A  is 
A->w. 

Output  A  and  replace  the  w  Just  read  in  the  CF 
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with  A.  This  is  the  new  CF. 

Go  to  step  3. 

3.  If  CF  =  S  (goal  symbol  of  grammar),  then  the 
parse  is  complete. 

Otherwise,  go  to  step  1 . 

III.D.2  The  Stack_Alaorlthm 

The  LR(0)  CFSM  parser  is  grossly  inefficient.  Our 
particular  objection  is  it's  rescan  of  previously  scanned 
portions  of  canonical  forms.  Since  the  parser  is 
deterministic,  this  rescanning  is  simply  wasted  processing. 
Our  solution  is  to  save  information  on  the  parse  in  a  push 
down  stack  so  as  to  remove  the  requirement  for  rescanning. 
This  will  be  our  stack  algorithm. 

Consider  a  single  iteration  of  the  parser.  The 
canonical  form  for  the  iteration  is  initially  CF'  =  rwb,  and 

after  scanning  rw,  the  parser  ends  up  at  an  apply  state  in 

which  the  production  being  applied  is  A->w.  Application  of 
the  production  results  in  the  new  canonical  form,  CF  =  rAb. 
However,  applying  the  parser  to  CF  will  take  it  through  the 

same  set  of  states  in  recognizing  r,  (i.e.  the  CFSM  is 

deterministic',  so  that,  had  it  remembered  the  state  entered 
Just  after  reading  r,  it  could  start  in  that  state  on  the 
canonical  form,  Ab,  and  get  the  same  result  as  if  it  had 
started  in  state-1  on  rAb.  It  is  clear,  then,  that  if  we 


-  65  - 


keep  a  push  down  stack  of  all  states  entered  by  the  CFSM,  we 
can  maintain  a  history  of  the  parse  that  is  relevant  at 
production  application  time.  Then,  when  the  parser  scans  rw 
and  enters  the  state  in  which  a->w  is  applied,  it  simply 
pops  ;  w  i  states  off  the  stack  (where  J.w!  means  the  number  of 
symbols  in  »v)  and  resumes  the  parse  of  Ab  in  the  state  that 
is  on  the  top  of  the  stack.  This  process  of  popping  and 
resuming  the  parse  in  the  top  state  of  the  stack  is  called 
look-back.  As  before,  the>parser  stops  when  CF  =  S. 

The  stack  algorithm  is  equivalent  to  the  LR(O)  CFSM 
parser  in  terms  of  the  resulting  parse,  and  it  is  far  more 
efficient.  Thus,  the  primary  motivation  for  development  of 
the  stack  algorithm  is  the  increase  in  parse  time  efficiency 
that  may  be  obtained.  Note,  however,  that  the  stack 
algorithm  has  only  one  interaction  with  the  symbols  of  the 
input  text,  the  initial  scan  up  to  the  application  of  the 
associated  production.  Thereafter,  all  manipulations  are  in 
terms  of  the  states  on  the  stack  and  the  non-terminal 
transitions  from  the  look-back  states.  Thus,  were  it  not 
for  the  need  for  the  non-terminal  transitions  from  look-back 
states,  all  non-terminal  transitions  of  the  CFSM  could  be 
ignored.  As  will  be  seen,  the  conversion  algorithm  takes 
care  of  this  by  modifying  the  look-back  states  in  such  a  way 
that  all  non-terminal  transitions  of  the  CFSM  are  deleted. 
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Although  the  development  of  the  stack  alqorithm  is 
important  for  reasons  of  efficiency,  the  primary  motivation 
for  the  development  of  the  conversion  algorithm  about  to  be 
presented  is  that  very  few  grammars  of  “practical*'  value  are 
LR(O).  We  must  therefore  introduce  look-ahead  to  resolve 
the  inadequate  CFSM  states. 

III.D.3  The  Conversion  Algorithm 

The  conversion  algorithm  will  be  applied  to  the  CFSM 
computed  in  Section  III.B,  and  will  produce  the  Initial 
(non-optimized)  DP DA  given  at  the  end  of  the  chapter.  The 
basic  steps  of  the  conversion  algorithm  are  as  follows* 

1 .  Convert  erch  CFSM  state  containing  read 
transitions  into  a  separate  DPDA  read  state, 
eliminating  all  transitions  on  non-terminal 
symbols « 

2.  Convert  each  CFSM  apply  transition  into  a 
separate  DPDA  apply  state  with  look-back 
transitions. 

3.  Convert  the  cfsm  inadequate  states  into  DPDA 
look-ahead  states. 

Conversion  tfi  Baad  States 

The  conversion  of  CFSM  read  states  into  DPDA  read 
states  is  straightforward.  Simply  go  through  the  CFSM  and 
establish  a  DPDA  read  state  for  each  CFSM  read  state.  Then, 
for  each  CFSM  read  state,  move  into  the  corresponding  DPDA 
read  states,  only  those  transitions  whose  transition  symbol 
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is  a  terminal  symbol. 


An  Inadequate  CFSM  state  containing  read  transitions 
also  generates  a  DPDA  read  state.  As  with  the  conversion  of 
CFSM  read  states  we  go  through  the  inadequate  CFSM  state 
and  move  only  those  transitions  whose  transition  symbol  is  a 
terminal  symbol. 

Applying  the  conversion  processes  Just  described  to  the 
CFSM  computed  in  Section  III.C  results  in  the  DPDA  read 
states  of  the  Initial  DPDA  given  at  the  end  of  the  chapter. 
The  parenthesized  state  numbers  in  the  DPDA  refer  to  states 
of  the  CFSM. 

C,QQK&EalflQ  -lfl-ApplY..Sla£fiS 

Each  apply  transition  of  the  CFSM  generates  a  separate 
DPDA  apply  state.  As  indicated  in  the  discussion  of  the 
stack  algorithm,  the  apply  state  can  be  provided  with 
look-back  Information  which  will  enable  the  parse  to  resume 
in  the  state  entered  Just  prior  to  beginning  the  scan  of 
the  symbols  on  the  right  side  of  ihe  applied  production.  In 
the  case  of  the  stack  algorithm,  once  it  was  determined  in 
which  state  to  resume  the  parse,  the  first  transition  taken 
was  on  the  non-terminal  produced  in  the  apply  state.  Since 
this  is  known  a  priori,  the  look-back  information  can  be 
extended  so  as  to  include  the  destination  state  of  this 
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non-terminal  transition.  The  extended  look-back  states  are 
referred  to  as  the  look-back  transitions  of  the  DPDA  apply 
state,  or  alternatively,  as  the  top  transitions,  or  the 
apply  transitions  of  the  state.  Incorporating  the  look-back 
transitions  into  the  parser  guarantees  that  no  transition 
will  ever  be  taken  on  a  non-terminal,  so  that  no 
non-terminal  transitions  need  appear  in  the  DPDA. 

rte  now  give  a  procedure  for  computing  the  look-back 
transitions  of  a  particular  DPDA  apply  state.  The  look-back 
states  of  an  apply  transition  of  the  CFSM  are  those  CFSM 
states  from  which  originate  a  transition  path  over  the 
symbols  of  the  applied  production,  the  path  terminating  in 
the  CFSM  state  to  which  the  apply  transition  belongs. 
During  the  computation  of  the  CFSM  (specifically,  during  the 
process  of  completing  a  configuration),  it  is  a  simple 
matter  to  keep  a  list  of  the  states  from  which  originate 
transition  paths  corresponding  to  the  right  side  of  the 
grammar's  productions.  Following  the  completion  of  the  CFSM 
computation,  we  go  through  this  list,  and,  for  each  entry, 
go  to  the  indicated  CFSM  state,  C,  and  follow  a  transition 
path  corresponding  to  the  right  side  of  the  indicated 
production,  P.  Thii  path  will  terminate  in  a  CFSM  state.  A, 
containing  an  application  of  the  production,  P.  The  DPDA 
read  state  corresponding  to  CFSM  state  C  will  be  a  look-back 
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state  of  the  DPDA  apply  state  corresponding  to  the 
application  of  P  in  A.  The  destination  state  of  the  apply 
transition  is  simply  the  DPDA  state  corresponding  to  the 
destination  state  of  the  transition  in  CFSM  state  C  which 
has  as  its  transition  symbol*  the  non-terminal  defined  in 
production  P. 

The  number  of  symbols  popped  in  a  particular  DPDA  apply 
state  will  be  one  less  than  the  number  of  symbols  in  the 
production  being  applied  in  that  state.  This  is  because 
only  the  read  states  of  the  DPDA  push  their  state  numbers 
onto  the  state  stack. 

Applying  the  conversion  processes  Just  described  to 
the  CFSM  computed  in  Section  III.C  results  in  the  DPDA  apply 
states  indicated  in  the  Initial  DPDA  at  the  end  of  the 
chapter.  Again,  the  parenthesized  state  numbers  refer  to 
CFSM  states. 

CflnYecalon...la-LQQk-Ahfiad-^taifi5 

An  inadequate  CFSM  state  is  inadequate  in  the  sense 
that  the  CFSM  parsing  algorithm  cannot  remain  deterministic 
when  encountering  such  a  state.  This  is  because  information 
does  not  exist  within  the  state  which  is  capable  of 
indicating  the  transition  to  be  taken  on  entering  that 
state.  To  remedy  this,  each  inadequate  CFSM  state  converts 


into  a  DPDA  look-ahead  state.  Also,  each  inadequate  state 
generates  a  separate  DPDA  apply  state  for  each  of  its  apply 
transitions,  and  a  single  DPDA  read  state  for  its  set  of 
read  transitions  (if  any).  In  effect,  the  look-ahead  state 
contains  the  potential  symbol  strings  (look-ahead  strings) 
that  could  be  encountered  on  pursuing  transitions  through 
the  generated  states.  By  pre-computing  and  saving  this 
look-ahead  information,  and  associating  each  look-ahead 
string  with  the  appropriate  generated  state,  we  maintain  the 
determinism  of  our  parsing  algorithm.  This  is  because  we  can 
compare  the  look-ahead  symbol  strings  with  the  symbols 
actually  occurring  ahead  in  the  input  text,  and  take  the 
transition  on  which  we  get  a  match.  If  we  do  not  get  a 
match  on  any  of  the  strings,  then  the  input  text  contains  a 
syntax  error. 

The  actual  computation  of  the  look-ahead  symbol  strings 
is  done  by  the  look-ahead  algorithm,  which  we  are  about  to 
present.  By  applying  the  conversion  process  and  computing 
the  look-ahead  symbols  with  the  look-ahead  algorithm,  we 
convert  the  inadequate  states  of  the  CFSM  computed  in 
Section  III.C  into  the  DPDA  look-ahead  states  in  the  Initial 
DPDA  at  the  end  of  the  chapter.  The  destination  state 
numbers  indicated  in  these  DPDA  look-ahead  transitions  refer 
to  DPDA  states. 
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The  look-ahead  algorithm  is  invoked  by  the  CFSM  to  DP DA 
conversion  algorithm  for  purposes  of  resolving  inadequate 
CFSM  states.  The  look-ahead  algorithm  takes  as  its  input, 
the  CFSM,  the  number  of  the  inadequate  state  to  be  resolved, 
and  the  length,  k,  of  the  look-ahead  strings  by  which 
resolution  is  to  be  attempted.  The  output  of  the  algorithm 
is  a  set  of  look-ahead  transitions,  each  transition 
associating  a  k-symbol  look-ahead  string  with  a  destination 
state.  Recall  that  each  inadequate  CFSM  state  generates  a 
separate  DPDA  apply  state  for  each  of  its  apply 
transitions,  and  generates  a  single  DPDA  read  state  for  its 
set  of  read  transitions.  These  generated  states  are  the 
destination  states  of  the  look-ahead  transitions. 

The  look-ahead  algorithm  leaves  to  the  CFSM  to  DPDA 
conversion  algorithm  the  task  of  determining  whether  the 
inadequacy  has  been  resolved  for  the  particular  state  and 
value  of  k.  The  condition  for  resolution  is  simply  that  no 
look-ahead  string  may  occur  in  two  or  more  look-ahead 
transitions  having  different  destination  states.  If  the 
attempt  at  resolution  is  successful,  the  conversion 
algorithm  saves  the  look-ahead  state  and  its  transitions  and 
proceeds  to  attempt  resolution  of  the  next  inadequate  CFSM 
state.  If  resolution  is  not  successful,  the  value  of  k  is 
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increased  by  one  and  the  look-ahead  algorithm  is  once  again 
invoked.  Resolution  is  attempted  for  values  of  k=l,2,  and 
3.  If  resolution  is  unsuccessful  after  three  svmbol 
look-ahead,  the  attempt  at  state  resolution  is  abandoned, 
error  diaonostics  are  issued,  and  the  inadequate  CFSM  state 
is  designated  unresolved. 

The  essence  of  look-ahead  is  that,  for  each  destination 
state  (in  the  above  sense),  a  computation  is  performed  to 
determine  the  set  of  all  possible  transition  strings  of  k 
terminal  symbols  (look-ahead  strings)  that  can  be 
encountered,  given  entry  into  the  particular  destination 
state.  These  look-ahead  string/destination  state  pairs 
constitute  the  look-ahead  transitions  of  the  look-ahead 
state  for  the  particular  value  of  k. 

The  algorithm  operates  on  the  information  local  to  the 
inadequate  state  in  the  CFSM.  In  computing  the  transition 
symbol  strings,  the  algorithm  may  encounter  CFSM  read 
states,  CFSM  apply  states,  and  inadequate  CFSM  states.  Let 
us  assume  that  on  entering  one  of  these  states,  the 
look-ahead  string  under  construction  is  S,  and  that  S  has 
been  built  up  to  a  length  of  1  symbols  (1=0, 1,2).  The 
algorithm  is  thus  looking  to  add  k-1  symbols  to  S,  and  its 
action  in  each  case  is  as  follows* 


CFSM  read  state 

Assume  that  the  state  has  n  transitigns  on 
terminal  symbols.  Then  the  look-ahead  string* 
S,  is  duplicated  n  times,  and  for  each  of  the 
n  transitions,  the  transition  symbol  is 
concatenated  to  a  copy  of  S,  resulting  in  a 
new  transition  symbol  string.  S',  of  length 
1+1.  Now,  if  l+l=k,  then  S'  is  complete. 
However,  if  l+l<k,  then  the  look-ahead 
algorithm  is  reapplied  at  the  destination 
state  of  the  transition.  On  reapplication, 
the  look-ahead  string  is  S'  and  the  algorithm 
is  looking  to  add  k-1-1  symbols  to  S'. 

CFSM  apply  state 

The  CFSM  apply  state  has  an  associated  DPDA 
apply  state  for  which  has  been  computed  a  set 
of  look-back  transitions.  The  look-ahead 
algorithm  is  reapplied,  with  S,  to  the 
destination  state  of  each  look-back 
transition . 

CFSM  Inadequate  State 

The  look-ahead  algorithm  as  applied  to  a  CFSM 
inadequate  state  may  be  viewed  as  its  repeated 
application  to  as  many  CFSM  read  states  and 
CFSM  apply  states  as  necessary  to  represent 
all  transitions  of  the  inadequate  state,  each 
application  being  performed  with  S. 


With  respect  to  the  reapplication  of  the  look-ahead 
algorithm,  we  place  the  restriction  that  the  algorithm  not 
be  reapplied  to  a  CFSM  state  to  which  it  was  previously 
applied  if  tbs  read  transitions  constituting  the  look-ahead 
string  up  to  the  proposed  reapplication  are  identical  to 


those  up 

to 

the 

previous  application. 

This  condition 

includes 

the 

nu  11 

look-ahead 

string  and 

prevents  the 

algorithm 

from 

entering  infinite 

cycles. 

The  look-ahead  algorithm  is  conceptually  quite  simple. 
The  complexity  of  its  implementation  comes  from  doing  the 
accounting  associated  with  keeping  track  of  each  look-ahead 
string  and  the  states  that  have  been  entered  at  each  level 
on  its  behalf.  As  an  accounting  aid,  we  have  found  the 
concept  of  look-ahead  contours  to  be  useful.  A  look-ahead 
contour  represents  a  set  of  look-ahead  symbols  for  each 
symbol  level  (value  of  k).  Each  contour  contains  a  number 
of  states,  and  transitions  between  states  are  within  the 
same  contour  Creapplication) ,  or  from  lower  contours  to 
higher  contours  on  single  contour  transitions  (read 
transitions).  A  dashed  line  transition  represents  the 
reapol lcation  of  the  algorithm  based  on  the  occurrence  of  an 
apply  transition.  A  solid  line  transition  represents  the 
application  of  the  look-ahead  algorithm  to  a  read 
transition,  the  transition  symbol  being  the  label  of  the 
transition.  The  look-ahead  strings  are  defined  by  the 
/  labels  on  the  read  transitions  of  continuous  directed 

transition  sequences. 

In  Figures  III.  I  and  I II. 2,  we  use  the  look-ahead 
contours  in  computing  the  look-ahead  transitions  of  the  two 
look-ahead  states  of  our  example  grammar.  In  the  process  of 
contou*  generation,  we  label  states  in  the  following  way* 

S(C*  D) 
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In  this  notation,  S  may  be  R  (CFSM  Read  state),  A  (CFSM 
Apply  state),  or  I  (CFSM  Inadequate  state).  C  is  the  number 
of  a  £FSM  state,  and  D  is  the  number  of  a  DPDA  state.  We 
also  use  the  notation  to  represent  the  DPDA  read  states  and 
the  DPDA  apply  states  generated  from  inadequate  CFSM  states. 

Our  example  grammar  is  CLR(J),  and  its  inadequacies 
can  thus  be  resolved  by  our  algorithm  with  one  symbol 
look-ahead.  In  Figure  III.l,  we  simply  perform  the  one 
symbol  look-ahead  for  the  first  inadequate  state.  In  Figure 
II  1.2,  we  generate  two  contours  for  the  second  inadequate 
state,  so  as  to  give  a  more  thorough  example  of  the 
look-ahead  algorithm  in  action. 


1 1  I . E  The  Algorithm  for  Optimizing  th»  DPDA 


The  algorithms  that  we  discuss  in  this  section  have  to 
do  with  the  optimization  of  the  contents  of  the  DPDA  without 
regard  for  its  representation  in  a  particular  computing 
environment.  That  is,  transformations  are  performed  on  the 
DPDA  that  remove  superfluous  and  redundant  information  so 
that  the  resulting  DPDA  is  more  efficient  than  its 
predecessor.  The  optimizations  that  have  been  implemented 
by  no  means  exhaust  the  potential  for  DPDA  content 
optimization.  Other  optimizations,  such  as  transition 
sorting  according  to  empirical  measures  of  frequency  of 
transition  occurrence  for  a  particular  language,  and 
detection  and  deletion  of  apply  states  that  have  no 
associated  semantics  and  that  do  not  modify  the  DPDA  state 
stack,  are  but  a  few  of  the  optimizations  that  could  have 
significant  impact  on  parser  space  and  time  efficiency. 

The  representation  of  the  DPDA  read  states  is  such  that 
the  information  regarding  the  states  themselves  is  stored 
separately  from  the  information  on  the  state's  transitions. 
This  being  the  case,  we  can  optimize  the  read  transitions  by 
deleting  duplicate  transition  sequences  that  may  arise  from 
different  read  states. 


There  are  two  fundamental  optimizations  that  are 


performed  on  the  DP DA  apply  states.  First,  for  each  apply 
state,  we  determine  the  most  popular  look-back  transition 
destination  state,  and  designate  that  the  default 
destination  state.  Then  ail  look-back  transitions  of  the 
state  whose  destination  s*ate  is  the  default  destination 
state  are  deleted  from  the  list  of  look-back  transitions. 
The  default  destination  state  is  then  appended  to  the  list, 
it  being  the  convention  that,  during  parsing,  should  the  top 
of  the  state  stack  (after  being  popped)  fail  to  match  any  of 
the  look-back  states  in  the  list  for  the  current  apply 
state,  then  the  transition  to  the  default  destination  state 
is  automatically  taken.  Since  the  CLR(k)  parser  is 
deterministic,  we  are  guaranteed  not  to  introduce  any  errors 
by  performing  this  optimization. 

The  second  optimization  that  we  apply  to  the  DPDA  apply 
states  is  analogous  to  the  optimization  applied  to  the  DPDA 
read  states,  and  we  thus  remove  redundant  information. 

The  number  of  optimizations  performed  on  the  DPDA 
look-ahead  states  is  one  or  two,  depending  on  whether  the 
grammar  in  question  is  lexical  or  primary,  respectively.  In 
either  case,  duplicate  look-ahead  transitions  for  a  given 
look-ahead  state  are  deleted.  In  the  case  of  a  parser 
computed  from  a  primary  grammar,  an  additional  optimization 
is  performed  on  the  look-ahead  states  which  is  analogous  to 
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the  first  of  the  optimizations  applied  to  the  DPDA  apply 
states.  Thus,  for  each  look-ahead  state,  we  determine  the 
most  popular  look-aheed  transition  destination  state,  and 
designate  that  the  default  destination  state.  Anv 
look-ahead  transition  whose  destination  state  matches  tne 
default  destination  state  is  deleted  from  the  list  of 
look-ahead  transitions  for  the  state  in  question,  and  the 
default  destination  state  is  appended  to  the  enu  of  the 
list.  The  parsing  interpretation  of  the  default  look-ahead 
transition  is  analogous  to  the  parsing  interpretation  of  the 
default  apply  transition.  However,  in  the  present  case,  the 
detection  of  an  erroneous  symbol  in  the  input  stream  will  be 
delayed  until  a  subsequent  read  state,  whereas  were  the 
default  destination  optimization  not  performed,  such  an 
error  would  be  detected  in  the  look-ahead  state. 

The  above  optimizations  have  been  applied  tc  the 
initial  DPDA  to  produce  the  final  DPD/  given  at  the  end  of 
the  chapter.  Note  that  in  this  final  DPDA  we  have  replaced 
all  parenthesized  CFSM  state  numbers  with  the  corresponding 

DPDA  state  numbers. 

In  addition  to  the  DPDA  content  optimizations,  a 
significant  improvement  in  parser  space  and  time  efficiency 
may  be  realized  by  -fine  tuning-  the  DPDA  to  the  particular 
computing  system  on  which  the  parser  is  to  be  executed. 
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This  may  involve  the  packing  of  integer  fields  Into  bit 
strings,  the  hashing  of  the  various  state  transitions,  and 
even  the  generation  of  an  assembly  code  representation  of 
the  DPDA  and  its  control  procedure.  We  hesitate  to  make 
generalizations  about  the  type  of  representation 
optimizations  that  can  be  performed,  since  the  range  of 
possibilities  is  limited  only  by  one's  imagination  and  the 
space-time  tradeoffs  inherent  in  the  computing  environment 
under  consideration.  Readers  interested  in  the 
optimizations  that  we  have  performed  in  the  representation 
of  the  DPDAs  on  Multics  are  referred  to  Appendix  B. 
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1 1 1 . F  Thft  CIRM)  P arsing, Algorithm 


In  this  section,  we  present  the  basic  CLR( i )  parsing 
algorithm  that  "drives”  the  language  processors  produced  by 
LIS.  We  present  the  CLR(l)  alar-  ithm  because  of  its 
simplicity  and  because  it  has  been  our  experience  that  one 
symbol  look-ahead  is  sufficient  for  most  applications. 
Extension  of  the  algorithm  to  CLR(k)  for  k>  1  is 
straightforward.  Readers  wishing  more  detail  on  the  CLR(I) 
algorithm  are  referred  to  Section  B.2.II  of  Appendix  B. 
Examples  of  the  execution  of  the  parser  on  text  of  our 
example  grammar  are  given  at  the  end  of  the  chapter. 

The  CLR (  1 )  parser  is  an  extension  of  the  stack 
algorithm  presented  in  Section  III.D.  The  stack  algorithm 
is  extended  to  incorporate  the  look-back  transitions  of 
apply  states  and  the  look-ahead  transitions  of  look-ahead 
states.  In  the  following  discussion,  we  assume  that  we  are 
driving  a  primary  parser,  and  that  a  lexical  parser  exists 
that  provides  lexical  constructs  on  demand  (of  course,  our 
algorithm  may  also  be  adapted  to  lexical  parsinq,  as 
indicated  in  Appendix  B).  We  fetch  a  construct  by  envoking 
the  procedure  FETCH_C()NSTRUCT,  the  fetched  construct  being 
placed  in  CONSTRUCT.  Our  discussion  also  makes  reference  to 
two  stacks,  a  DPDA  state  stack  and  a  text  reference  stack. 
The  DPDA  state  stack  is  the  stack  of  DPDA  read  states  that 
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is  maintained  so  as  to  implement  the  look-back  transitions 
of  apply  states.  The  text  reference  stack  contains  the 
lexical  constructs  as  recognized  by  the  lexica]  parser*  and 
represents  the  primary  interface  of  the  language  semantics 
with  the  input  text.  The  use  of  this  stack  is  explained  in 
Appendix  A. 


The  CLR( 1 )  parsing  Algorithm* 

I  .  Initial izatlon 

Perform  the  initialization  semantics  specified 
in  the  language  definition. 

FETCFLCONSTRUCT,  have.construct  =  "yes". 

Clear  DPDA  state  stack.  Clear  text  reference 
stack. 

Go  to  Step  3. 

2.  Next.  State 

If  STATE  is  a  DPDA  read  state,  go  to  step  3. 

If  STATE  is  a  DPDA  apply  state,  go  to  step  4. 
IF  STATE  is  a  DPDA  look-ahead  state,  go  to 
step  5. 

3.  DPDA  Read  State 

If  have_construct  =  “no",  FETCH_CONSTRUCT. 
nave_construct  s  "no*1. 

Push  STATE  onto  DPDA  state  stack. 

Push  CONSTRUCT  onto  text  reference  stack. 

,can  transitions  of  STATE  looking  for  match 
with  CONSTRUCT. 

If  match  not  found,  then  a  syntax  error  has 
been  detected,  so  exit  to  error  reporting  and 
recovery  procedure  (Section  IV. C). 

Otherwise,  set  STATE  to  the  transition 
destination  state  of  the  matching  transition. 
Go  to  step  2. 

4.  DPDA  Apply  St  ite 

If  semantics  is  associated  with  the  BNF  rule 
to  which  the  production  to  be  applied  belongs, 
activate  the  semantics. 

Pop  the  DPDA  state  stack  as  many  times  as 
indicated  for  STATE. 
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If  the  production  being  applied  defines 
<primary_non_terminal>,  then  processing  is 
complete,  so  exit. 

Scan  look-back  transitions  for  STATE  looking 
for  a  match  with  the  top  of  the  DPDA  state 
stack. 

if  a  match  is  found,  set  STATE  to  the 
destination  state  of  the  matching  look-back 
transit  ion. 

If  a  match  is  not  found,  set  STATE  to  the 
default  destination  state  for  STATE. 

Go  to  step  2. 

5.  DPDA  Look- Ahead  .Stalfi 

If  have_construct  *  "no11,  FETCH_CONSTRUCT. 
have_construct  *  "yes11 

Scan  look-ahead  transitions  for  STATE,  looking 
for  a  match  with  CONSTRUCT. 

If  a  match  is  found,  set  STATE  to  the 
destination  state  of  the  matching  look-ahead 
transition. 

If  a  match  Is  not  found,  set  STATE  to  the 
default  destination  state  of  STATE. 

Go  to  step  2. 
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Chapter  IV 


Conclusions 


IV. A  Introduction 

In  this  chapter,  we  consider  a  number  of  issues.  In 
Section  IV. B,  we  present  empirical  evidence  supporting  our 
previous  claims  regarding  the  efficiency  of  CLR(k)  parsing. 
In  Section  IV. C.  we  discuss  the  design  of  an  important 
enhancement  to  the  Language  Implementation  System,  namely 
syntax  directed  error  detection,  reporting,  and  recovery 
procedures.  In  Section  IV. D,  we  briefly  discuss  the 
hierarchy  of  LR(k)  systems  and  indicate  the  position  of  the 
CLR(k)  gramnars  within  this  hierarchy.  In  Section  IV. E,  we 
consider  some  of  the  significant  language  developments  in 
which  US  has  been  utilized.  In  Section  IY.F,  we  discuss 
areas  of  future  research  and  development  that  may  be 
expected  to  have  significant  impact  on  language 
implementation  system  technology. 

I V . B  Thfl  Fff IfilnnrY  £LRUU  Parsers 

Efficiency  of  parsing  strategies  is  typically  analyzed 
along  two  dimensions,  space  and  time.  Space  efficiency 
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refers  to  the  space  requirements  of  the  parsing  strategy, 
whereas  time  efficiency  refers  to  parsing  speed.  In  this 
section,  we  Investigate  the  space  ^nd  time  efficiencies  of 
the  CLR(k)  parsers  produced  by  LIS.  Our  presentation  will 
be  In  three  parts.  First,  we  report  on  the  empirical 
efficiency  comparisons  made  at  the  University  of  Toronto 
between  the  parsers  produced  by  their  LALR(k)  generator  and 
parsers  produced  by  popular  precedence  methods.  Then  we 
report  on  the  efficiency  characteristics  of  the  primary 
parsers  produced  by  LIS.  Finally,  wf  indicate  the  way  In 
which  we  have  adapted  our  CLR(k)  strategy  to  the  production 
of  efficient  lexical  parsers. 

IV.B.l  Th«  Toronto  LALR(k)/PreCBdftbCA-CpmPflrl5flQA 

• 

The  most  meaningful  empirical  investigations  into  the 
efficiency  of  parsing  strategies  are  those  which  compare 
alternative  strategies  across  a  common  base  of  languages  in 
a  comnon  computing  environment.  Unfortunately,  this  type  of 
comparison  was  not  possible  in  the  case  of  LIS,  since  no 
other  automatic  strategies  exist  on  Multlcs.  However,  the 
Computer  Systems  Research  Group  at  the  University  of  Toronto 
performed  exactly  these  types  of  comparisons  on  an  IBM 
System/360  (Model  44),  The  comparisons  were  made  among  the 
LALR(k)  parsers  (see  Section  IV. D)  produced  by  their  system 


) 
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(La  71,  Hola  71),  the  mixed  strategy  precedence  of  XPL  (MHW 
70),  and  Wirth-Webber  simple  precedence  (WW  66).  The 
Toronto  comparisons  are  relevant  to  LIS  because,  for  the 
cases  considered,  the  resolution  of  the  CFSM  inadequate 
states  by  the  LALR(k)  algorithm  is  equivalent  to  the 
resolution  by  the  CLR(k)  algorithm.  Leaving  out  the 
details,  we  reproduce  their  results  in  Figures  IV.  1  and 
IV. 2.  As  indicated,  the  LALR(k)  strategy  is  significantly 
more  efficient,  both  in  space  and  time,  than  the  precedence 
methods.  The  important  result  of  their  investigations, 
however,  is  not  the  degree  to  which  LALR(k)  is  more 
efficient,  but  that  they  compare  "very  favorably  in 
efficiency  with  precedence  methods  which  have  themselves 
proved  to  be  quite  acceptable  in  practice.  We  conclude  that 
efficiency  is  not  an  objection  to  LR(k)-based  techniques". 

IV. B. 2  CLR.U2  Primary  Barsai-EmcleQcy 

In  this  section,  we  report  on  the  empirical 
measurements  of  the  space  and  time  efficiency  of  selected 
CLR(k)  parsers  produced  by  LIS.  The  measurements  were  taken 
on  Multics  (H645)  when  the  system  was  simultaneously 
supporting  20  users,  and  configured  with  one  central 
processing  unit  and  384,000  words  (36  bits/word)  of  main 
memory.  The  LIS  Processor  Control,  which  included  the 
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control  for  lexical  analysis,  occupied  905  words.  In 
measuring  time  efficiency,  programs  containing  not  less  than 
12,000  tokens  were  used,  and  measurements  were  taken  over 
several  executions  and  the  results  averaged. 


The  languages  for  which  we  report  our  results  are  the 
primary  grammars  of  FILETRAN,  SCHEMA,  and  PL/I  (see 
discussions  in  Section  IV. E).  Our  measurements  of  DPDA  size 
exclude  the  requirements  of  the  key-symbol  table  of  the 
associated  grammar. 

a.  FILETRAN 

Grammar*  337  Productions 

135  Non-Terminals 
169  Terminals 

DPDA  Size*  855  States 

1553  Words 

Parse  Speed*  145,000  Tokens/Minute 

b.  SCHEMA 

Grammar*  432  Productions 

184  Non-Terminals 
99  Terminals 

DPDA  Size*  807  States 

I  520  Words 

Parse  Speed*  100,000  Tokens/Minute 

c.  ELZJL 

Grammar*  358  Productions 

139  Non-Terminals 
135  Terminals 

DPDA  Size*  768  States 

1717  Words 

Parse  Speed*  90,000  Tokens/Minute 


As  with  the  Toronto  comparisons,  the  essential  point  of 
these  results  Is  that  the  parsers  produced  by  LIS  are  quite 
acceptable  on  efficiency  criteria. 
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IV. B. 3 


ni.R(k)  Lexical _Parser  Efficiency 


We  have  been  successful  in  adapting  the  CLR(k)  parsers 
produced  by  LIS  to  the  production  of  efficient  lexical 
parsers.  This  has  been  possible  because  of  several  factors* 

a.  The  phrase  structure  of  lexical  constructs  is 
generally  of  little  interest,  the  task  of 
lexical  analysis  being  limited  to  the 
efficient  recognition  of  rather  simple 
constructs . 

b.  The  constructs  defined  by  the  lexical  grammar 
typically  include  sets  of  structurally 
equivalent  terminal  characters,  such  as  the 
set  of  the  lower  case  letters,  and  the  set  of 
the  integers  from  zero  to  nine. 

c.  The  sets  of  structurally  equivalent  terminal 
characters  have  consecutive  numeric  character 
cod^e  representations  on  most  computers. 


Taking  advantage  of  these  factors,  we  were  able  to 
modify  the  DPDAs  and  the  lexical  parser  control  to  admit 
read  transitions  and  look-ahead  transitions  of  the  form, 
S->F.  The  parsing  interpretation  of  such  a  transition  is 
that  a  terminal  character  whose  numeric  value  lies  between 
the  numeric  values  of  S  and  F  (inclusively)  satisfies  the 
condition  of  the  transition.  Furthermore,  when  a  character 
has  been  found  that  satisfies  the  condition,  all  subsequent 
characters  satisfying  the  condition  are  also  accepted  prior 
to  taking  the  transition.  The  utilization  of  these  "special 
lexical  encodings"  in  developing  lexical  grammars  and  their 
parsers  is  discussed  in  Appendix  A. 
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A  rather  large  lexical  grammar  is  the  lexical  grammar 
of  PL/I  given  in  Appendix  D.  The  following  empirical 
measurements  of  the  efficiency  of  the  PL/I  lexical  parser 
produced  by  LIS  were  taken  on  Multics  under  the  same 
conditions  that  prevailed  during  the  measurements 
discussed  in  the  previous  section. 

Grammar*  36  Productions 

ti  Non-Terminals 
73  Terminals 

DPDA  Size*  73  States 

136  Words 

Parse  Speed*  80,000  Characters/Minute 

The  space  efficiency  of  the  parser  is  quite  good. 
While  the  time  efficiency  is  certainly  acceptable,  there 
exist  additional  optimizations  that  may  be  performed  to 
increase  this  efficiency  even  further.  First,  the 
key-symbol  table  is  presently  searched  linearly,  so  that 
hashing  of  the  table  w'll  result  in  significant  increases 
in  lexical  parsing  speed.  Second,  we  can  perform 
optimizations  on  the  DPDA  which  will  eliminate  apply  states 
that  apply  unit  productions  (Pag  73).  Finally,  on  a 
computer  such  as  the  IBM  System/360  or  the  IBM  System/370, 
we  can  translate  the  entire  lexical  parser  into  an  assembly 
language  program,  and  employ  the  highly  efficient  tr*  nslate 
and  test  instruction  in  implementing  the  read  and  look-ahead 
states.  This  optimization  will  be  at  a  slight  expense  in 
space  efficiency,  but  will  enable  the  speed  of  CL,R(k) 
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lexical  parsers  to  compare  favorably  with  the  best  hand 
coded  assembly  language  alternatives. 
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I v . c  Syntax  Directed. Error  Detection* 
parting.  and_Rfi-CHK&£^ 

Users  of  a  particular  artificial  language  will 
inevitably  make  (context-free)  syntax  errors  in  the  text 
submitted  for  processing.  In  response  to  such  errors,  the 
language  processor  must  be  able  to  dfit&Cl  the  errors,  report 

the  errors  in  an  intelligible  form,  and  from  the 

errors  in  such  a  way  that  processing  may  continue.  These 
error  handling  procedures  must  be  local  ired  to  the  input 
text  immediately  surrounding  a  particular  error,  so  that 
recovery  from  that  error  will  not  resi  It  in  skipping  over 
large  portions  of  the  input  text.  Furthermore,  the  error 
handling  procedures  should  have  appropriate  termination 
conditions  so  as  not  to  produce  an  avalanche  effect  when 
particularly  difficult  errors  are  encountered.  In  this 
section,  we  discuss  the  basic  design  of  the  error  handling 
procedures  planned  for  LIS.  The  design  that  we  outline  is 
an  extension  of  the  work  by  James  (Jam  71),  which  is,  in 
turn,  an  extension  of  the  work  done  by  Leinius  (Lei  70)  on 
syntax  directed  error  handling. 

IV. C. I  Frrhr  Detection 

* 

Error  detection  in  the  CLR(k)  parser  presented  in 
Section  III.F  is  straightforward*  an  error  exists  in  the 
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input  text  whenever  the  parser  enters  a  read  state  and  the 
current  input  symbol  does  not  match  any  of  the  transition 
symbols  of  the  state.  Were  it  not  for  the  default 

transition  optimization  that  is  performed  on  look-ahead 
states*  errors  would  be  detected  in  these  states  as  well. 
However*  because  of  the  optimization,  errors  that  would 
ordinarily  be  detected  in  a  look-ahead  state  will  qo 
undetected  until  the  next  read  state  is  entered.  In  the 
meantime,  the  parser  may  have  entered  apply  states, 
resulting  in  the  execution  of  semantics  and  the  popping  of 
the  DPDA  state  stack.  The  non-deterministic  execution  of 
semantics  causes  problems  for  subsequent  language 
processing,  while  the  stack  popping  complicates  the  recovery 
procedures.  Our  solution,  therefore,  is  not  to  perform  the 
default  look-ahead  transition  optimizations,  so  that  the 
resulting  CLR(k)  parser  will  detect  context-free  syntax 
errors  "as  soon  as  they  occur". 

The  ability  of  the  CLR(k)  parser  to  detect  errors  "as 
soon  as  they  occur"  is  a  significant  one,  since  there  are 
many  parsing  schemes  in  which  this  ability  does  not  exist. 
Thus,  for  example,  in  the  case  of  the  precedence  methods,  it 
is  entirely  possible  for  consistent  precedence  relations  to 
exist  within  a  handle  that  does  not  match  the  right  side  of 
any  production,  and  which,  therefore,  contains  an  error. 
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Error 


detection , 


reporting, 


and  recovery  become 


significantly  more  complex  as  a  result  (Len  70).  The 
detection  capability  of  the  CLR(k)  parser  is  therefore  of 
significant  value,  since  it  properly  initiates  the  error 

handlinq  procedures. 

iv. c. 2  Eixur-fifiUflrJLiiia 

The  minimum  information  that  should  be  delivered  as  an 
error  diagnostic  includes  the  point  in  the  input  text  at 
which  the  error  was  detected,  the  symbol  encountered  at 
that  point,  and  the  set  of  symbols  that  could  legitimately 
be  accepted  at  that  point.  It  should  be  obvious  that  this 
information  is  available,  given  the  detection  capability 
discussed  in  the  previous  section.  However,  in  addition  to 
this  basic  information  (based  jtrictly  on  the  language's 
terminal  symbols),  it  is  also  desirable  to  deliver 

information  on  the  phrase  structure  surroundir.g  the 
detected  error.  This  information  can  be  delivered  if  the 
non-terminal  transitions  are  retained  in  the  DPDA. 

Reporting  of  the  phrase  structure  up  to  the  point  of  the 
error  would  then  be  accomplished  by  simply  going  down  the 
DPDA  state  s‘ack  and  reporting  the  symbols  (terminal  and 
non-terminal )  that  access  the  stacked  states.  Likewise,  the 
non-terminal  transitions  may  be  used  to  suggest 
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possibilities  for  acceptable  phrases  beginning  at  the 
point  of  the  error.  We  conclude*  therefore,  that  proper 
error  reporting  can  be  automatically  performed  for  the 
CLR(k)  parser,  but  only  if  the  non-terminal  transitions  are 
retained  in  the  DPDA.  So  as  to  maintain  the  space 
efficiency  of  the  parser,  however,  we  elect  to  enter  this 
non-terminal  transition  information  into  a  separate  “error 
handling'1  segment,  to  be  referenced  only  when  handling 
errors . 

IV.C,:i  Error  Recovery 

Having  developed  procedures  to  handle  automatically  the 
task  of  context-free  syntax  error  detecting  and  reporting, 
we  now  turn  our  attention  to  the  problem  of  error  recovery. 
We  incorporate  two  algorithms  for  error  recovery  into  our 
CLR(k)  error  handling  procedures.  In  the  first,  detection 
of  an  error  causes  a  check  to  be  made  to  determine  whether 
the  error  is  likely  the  result  of  a  key-symbol  spelling 
error.  This  could  occur  in  the  parse  at  a  point  at  which  a 
certain  set  of  key-symbols  is  acceptable,  but  at  which  an 
identifier  (in  the  usual  sense)  or  an  unacceptable 
key-symbol  is  actually  encountered.  In  such  a  situation,  we 
use  well  developed  spelling  comparison  algorithms  (Mor  70) 
to  compere  the  encountered  symbol  against  the  possible 
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key-symbol  transitions  of  the  current  state.  If  the 
comparison  is  positive  for  exactly  one  of  the  transitions,  a 
spelling  correction  diagnostic  i  issued,  the  transition  is 
taken,  and  the  parse  continues.  Otherwise,  the  recovery 
attempt  proceeds  to  the  second  recovery  algorithm. 

The  second  error  recovery  algorithm  is  based  on  the 
phrase  structure  of  the  input  text,  and  is  thus  considered  a 
phrase  structure  recovery  algorithm.  The  algorithm  first 
isolates  the  portion  of  the  input  text  containing  the  error. 
It  relies  on  the  error  reporting  procedure  discussed  in  the 
previous  section  to  identify  the  phrase  that  was  in  the 
process  of  being  parsed  when  the  error  was  detected.  The 
basic  approach  of  the  algorithm  is  to  search  down  the  DPDA 
state  stack  looking  for  a  state  containing  a  transition  on  a 
non-terminal  to  which  the  partial  phrase  could  be  reduced. 
Given  such  a  state,  it  takes  the  non-terminal  transition 
satisfying  the  condition  and  continues  the  parse  until  it 
leads  to  a  DPDA  read  state.  It  then  scans  the  input  text 
until  a  terminal  symbol  is  found  that  matches  one  of  the 
transition  symbols  from  the  read  state.  The  intervening 
text  is  skipped  over,  and  the  parse  is  resumed. 

The  above  recovery  algorithm  is  in  need  of  refinement 
before  it  can  be  considered  operational.  The  first 
refinement  has  to  do  with  the  way  in  which  the  algorithm 
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determines 


the  possible  non-terminals  to  which  the  partial 

phrase  may  be  reduced.  Only  certain  non-terminal 

transition  symbols  of  certain  states  may  be  considered 

candidates.  For  a  particular  stacked  state,  the  candidate 

non-terminal  transition  symbols  are  determined  as  follows* 

The  set  of  candidate  non-terminal  transition 
symbols  of  the  state  is  initialized  to  those 
that  are  defined  by  productions  whose  first 
symbol  is  the  terminal  symbol  read  by  the 
parser  when  in  that  state.  If  no  such 
non-terminal  transition  symbols  exist,  then 
the  stacked  state  is  not  a  candidate  recovery 
state.  Otherwise,  the  closure  of  the  set  is 
obtained  by  including  all  of  the  state's 
non-terminal  transition  symbols  that  are 
defined  by  productions  whose  first  symbol  is 
a  member  of  the  set. 


There  is  a  definite  hierarchy  to  the  set  of 
non-terminal  transition  symbols  associated  with  a  particular 
terminal  transition  symbol  of  a  particular  DPDA  read  state. 
This  hierarchy  is  computed  for  each  terminal  transition  of 
each  DPDA  read  state,  and  stored  in  the  "error  handling** 
segment  mantioned  in  the  last  section.  The  second 
refinement  to  the  basic  recovery  algorithm  involves  the 
utilization  of  this  hierarchy.  The  hierarchy  specifies  a 
definite  ordering  to  the  the  application  the  the  recovery 
algorithm,  an  ordering  that  is  necessary  because,  in 
general,  there  does  not  exist  a  unique  recovery  for  each 
error.  Utilizing  this  hierarchy,  the  procedure  is  to 
attempt  recovery  to  the  lowest  possible  non-terminal  in  the 


hierarchy  that  leads  to  a  state  that  can  read  a  terminal 
symbol  within  the  (heuristically  set)  bounds  of  the 
erroneous  phrase.  However*  due  to  the  possibility  of 
subsequent  errors  within  the  phrase,  it  may  be  appropriate 
to  recover  accordinq  to  a  higher  level  in  the  hierarchy  or 
even  to  a  lower  state  in  the  DPDA  state  stack. 

In  the  case  of  a  grammar  in  which  the  statements  are 
delimited  by  a  reserved  symbol  (such  as  "»■"  in  PL/I), 
recovery  will  generally  be  accomplished  within  the  statement 
containing  the  error.  In  all  cases,  the  algorithm  will 
terminate  because  the  largest  partial  phrase  can  always  be 
reduced  to  the  goal  symbol  of  the  grammar 


(<pr imary_non_terminal >) 


iv.D  LRU?  Hierarchy 


In  this  section*  we  define  the  significant  levels  in 
the  LRU)  hierarchy  and  indicate  the  position  within  this 
hierarchy  of  the  CLR(K)  grammars. 

The  hierarchy  of  LRU)  grammars  is  indicated  below. 

The  grammars  are  listed  (top  to  bottom)  in  order  of 

decreasing  grammarical  comprehension. 

LRU) 

LALRU) 

CLRU) 

SLR(k) 

LR(O) 

IV.D.l  LRU)  (Left  to  Right,  Jfc  symbols) 

A  context-free  grammar,  G,  is  LRU)  if  and  only  if 
every  canonical  form  A  =  PB  of  0,  except  A  «  S  (S  is  the 
goal  symbol)  has  a  unique  characteristic  string  P#p  which 
can  be  determined  by  investigating  only  P  and  the  first  L 
symbols  of  B. 

IV.D. 2  l-ALP U)  (Look  Ahead  Left  to  Right,  Jt  symbols) 

A  context-free  grammar,  G,  is  LALR(k)  if  and  only  if 
the  inadequate  states  of  G's  CFSM  can  be  resolved  with  L 
symbols  of  look-ahead.  Operationally,  this  definition  is  of 
little  use.  To  get  an  operational  description,  we  turn  to 
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Lalonde's  discussion  of  the  LALR(k)  algorithm  (Lai  71)* 

The  philosophy  of  the  LALR  routine  is 
such  that  when  a  state  is  inadequate,  a  tree 
of  predecessor  states  which  can  access  this 
inadequate  state  is  built  up  one  level  at  a 
time  (one  level  meaning  one  transition).  To 
each  level  there  corresponds  a  predecessor 
set.  The  depth  or  numer  of  successive 
predecessor  sets  which  must  be  calculated 
depends  naturally  on  the  maximum  number  cf 
states  which  must  ba  pulled  (by  applying  a 
production)  starting  originally  from  the 
inadequate  state.  These  predecessor  sets  can 
therefore  be  considered  as  forming  a  mainline 
predecessor  path  (in  actual  fact,  a 
predecessor  tree)  which  dictates  the  past 
history  up  to  this  particular  inadequate 
state.  From  any  given  state,  if  no  #-symbols 
are  encountered,  it  is  an  easy  matter  to 
project  forward  to  obtain  sequential  lists  of 
k  terminals  which  could  be  seen  by  look-ahead. 
When  a  #(P)  symbol  is  encountered,  however, 
the  terminals  which  could  follow  If  production 
P  were  applied  must  be  collected.  To  do  this, 
the  number  of  states  equal  to  the  length  of 
the  RHS  of  production  P  is  pulled. 
Furthermore,  of  the  possible  states  which  are 
now  visible,  only  those  which  can  reach  the 
inadequate  state  (namely  those  on  the  mainline 
predecessor  path)  are  candidates.  Having 
found  the  production  goal  for  production  P  in 
each  of  these  states,  the  process  of 
collecting  terminals  is  resumed  starting  from 
each  of  the  destination  states  of  these 
production  goals.  These  terminals  are  of 
course  added  to  the  successive  terminals 
collected  so  far.  This  is  obviously  very 
recursive  and  repetitive  but  nevertheless, 
essential  for  localized  look-ahead. 

Moreover,  if  during  look-aheads,  side 
branches  are  followed  which  lead  away  from  the 
mainline  predecessor  path  (this  occurs 
whenever  a  symbol  is  added  to  the  set  and  at 
least  one  other  succeeding  symbol  is  sought), 
then  pulls  which  are  performed  there  must  fall 
on  the  side  branch  taken  or  (if  the  pull  has 
enough  depth)  on  the  mainline  predecessor 
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path . 


In  all  cases,  the  mainline  predecessor 
path  is  backed  up  dynamically  as  far  as 
necessary. 


The  above  discussion  does 

not 

treat 

the 

termination 

condition  adequately  because 

it 

does 

not 

specify  the 

action  to  be  taken  if  a  particular  CFSM  state  is  re-ent'~red 
when  looking  for  a  symbol  at  a  given  look-ahead  level.  By 
private  communication,  Lalonde  indicated  to  this  author 
that  his  algorithm  does  not  perform  such  re-entry,  the 
assumption  being  that  re-entry  would  add  no  new  look-ahead 
symbols.  This  author  implemented  Lalonde-'s  algorithm  and 
verified  a  suspicion  that  this  re-entry  assumption  is 
invalid.  Empirically,  this  conclusion  is  based  on  the 
failure  of  the  algorithm  to  properly  handle  the  PL/I 
<conditional_statement>  as  given  in  Appendix  D.  A 
stiff  lr  lent  condition  for  rejecting  re-entry  for  a 
particular  state  and  look-ahead  level  is  that  the  mainline 
predecessor  oaths  and  the  side  paths  for  the  proposed 
re-entry  be  the  same  as  those  existing  at  the  time  of  the 
original  entry.  However,  the  magnitude  of  the  data 
manipulations  associated  with  saving  and  comparing 
predecessor  paths  and  side  paths  influenced  us  to  reject 
the  LALR(k)  algorithm  in  favor  of  the  simpler,  yet 
comprehensive,  CLR(k)  algorithm. 
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IVcD.3  CI.RC  kJ  (Comprehensive  Left  to  flight,  L  symbols) 


A  context-free  grammar  is  CLR(k)  if  and  only  if  the 
inadequate  states  of  its  CFSM  can  be  resolved  by  the 
algorithm  presented  in  Section  III.D.3.  The  basic 
difference  between  the  CLR(k)  algorithm  and  the  LALR(k) 
algorithm  is  that  the  CLR(k)  algorithm  takes  all  apply 
transitions  of  the  apply  states  entered  during  look-ahead. 
This  alleviates  the  need  to  save  mainline  predecessor  paths 
and  side  paths  and  guarantees  that  re-entry  (in  the  previous 
sense)  need  never  be  performed.  The  CLR(k)  grammars 
represent  a  subset  of  the  LALR(k)  grammars,  although  the 
only  grammars  that  have  been  found  to  be  LALR(k)  and  not 
CLR(k)  have  been  pathological  ones. 

A  personal  note  is  appropriate  regarding  the 
definition  of  the  CLR(k)  grammars.  The  CLR(k)  grammars  were 
defined  by  Frank  DeRemer  and  this  author  out  of  a  basic 
dissatisfaction  with  the  overhead  involved  in  the  LALR(k) 
algorithm  and  because  we  believed  that  the  CLR(k)  algorithm 
would  cover  virtually  all  grammars  of  “practical*  interest. 
Thus  far,  this  has  proved  to  be  a  reasonable  tradeoff. 


IV. D. 4  SLR( kl  (Simple  Left  to  Right,  L  symbols) 

A  context-free  grammar  is  SLR(k)  if  and  only  if  the 
inadequate  states  of  its  CFSM  can  be  resolved  using  the 
SLR(k)  algorithm  (DeR  69).  The  SLR(k)  algorithm  is  similar 
to  the  CLR(k)  algorithm  except  that  when  a  CFSM  apply 
transition  is  encountered  during  look-ahead,  a  computation 
is  made  on  the  grammar  to  determine  the  set  of  terminal 
symbols  that  can  legitimately  follow  the  non-terminal 
defined  in  the  production  being  applied.  The  SLR(k) 
condition  Is  thus  based  on  information  global  to  the 


grammar. 

and 

is 

therefore 

not 

as  comprehensive 

as  the 

LALR(k) 

and 

the 

CLR(k) 

conditions,  which 

are 

based  on 

information 

local 

to  the 

CFSM 

inadequate 

state 

being 

resolved. 

IV. D. 5  LR(Q)  (Left  to  Right,  Q  symbols) 

A  context-free  grammar  is  LR(O)  if  and  only  if  its  CFSM 
contains  no  inadequate  states.  Virtually  no  grammars  of 
'•practical*  value  are  LR(O). 
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I V .  E  APBiiCAJJ  QPS  of  _L1S 


In  this  section,  we  consider  some  of  the 
language/processor  developments  in  which  the  Language 
Implementation  System  has  been  utilized.  Our  discussions 
here  will  be  brief!  the  reader  wishing  more  detail  on 
applications  of  LIS  is  referred  to  Appendices  A,  C,  D,  and 
E.  The  applications  discussed  here  are  only  representative 
of  those  to  date,  which  also  include  PAL,  Algol  60,  and  the 
Procedure  Division  of  COBOL. 

I V  . E  .  I  assign 

The  assign  language  is  a  simple  assignment  statement 
language.  The  processor  that  we  developed  for  assign  is  a 
simple  translator  from  the  language  into  a  symbolic 
intermediate  language,  the  type  that  may  be  produced  by  the 
interpretation  phase  of  a  compiler,  for  example.  The  assign 
language  is  discussed  in  detail  in  Appendix  A,  Section  A. 7. 
We  mention  the  assign  language  here  because  it  is  probably 
the  simplest  example  of  the  way  in  which  the  syntax  and 
semantics  of  an  artificial  language  may  be  integrated  into 
a  concise  language  definition  for  processing  by  LIS.  As 
such,  it  serves  as  a  good  introduction  to  the  reader 
wanting  to  investigate  the  more  comprehensive  applications 
in  the  other  appendices. 


IV. E. 2  aims 


The  pl6535  language  is  a  block  structured  language  that 
illustrates  the  translation  of  fundamental  high  level 
language  constructs  into  a  particular  formal  semantic 
system.  The  pl6535  language  and  its  processor  are  discussed 
in  detail  in  Appendix  C.  pl6535  was  developed  as  a  term 
project  for  a  graduate  computer  science  course  at  MIT,  and 
the  author  was  ably  assisted  in  this  development  by  fellow 
graduate  students,  Thomas  Gearing  and  Gordon  Weekly. 
( AGW  72). 


IV. E. 3  FILETBAM 

The  FILETRAN  language  is  being  developed  at  Honeywell 
for  the  purpose  of  providing  a  facility  for  translating 
arbitrary  data  files  into  data  files  compatible  with  a 
particular  Honeywell  computer.  The  size  and  efficiency  of 
the  FILETRAN  language  and  its  processor  are  indicated  in 
Section  IV. B. 2. 

IV. E. 4  SCHEMA 

The  COBOL  SCHEMA  and  its  processor  represent  an 
implementation  of  a  data  base  schema  language.  The  size  and 
efficiency  of  the  SCHEMA  language  and  its  processor  are 


indicated  in  Section  IV. 8. 2. 


IV. E. 5  ELZ1 

The  primary  grammar  of  PL/I  that  appears  in  Appendix  D 
is  the  most  complex  grammar  submitted  to  LIS  to  date.  The 
grammar  is  a  very  large  sub-grammar  of  the  IBM  Laboratory 
Vienna's  specification  of  the  concrete  syntax  of  PL/I 
(AOU  68),  and  includes  declarations,  input/output,  and 
on-conditions. 

Appendix  D  also  includes  the  PL/I  lexical  grammar, 
which  is  the  most  complete  lexical  grammar  yet  submitted  to 

LIS. 


IV. E. 6  fiXPEfiSS 

The  express  language  is  the  only  example  that  we  give 
in  which  LIS  was  utilized  in  the  development  of  an 
interactive  management  information/decision  system  language. 
A  discussion  of  this  development  is  given  in  Appendix  E. 


IV  F  Arens  for  Futur e_Rasfiarch  aDd-DaYfilQPff'&Ql 


In  this  section,  we  briefly  consider  areas  of  research 
and  development  that  we  feel  will  have  significant  impact  on 
language  implementation  system  technology.  Our  discussions 
here  are  admittedly  too  abbreviated  to  do  more  than  suqgest 
the  broad  boundaries  of  these  efforts. 

IV. F. 1  Formal  Semantic  Systems 

It  is  widely  recognized  that  formal  semantic  system.*- 
have  not  achieved  the  same  level  of  development  as  the 

corresponding  work  in  formal  grammar  systems  and  the 
application  of  automata  theory  to  the  recognition  of 
artificial  languages.  As  Winogn J  states  (Win  71),  "The 
field  of  semantics  has  always  been  a  hazy  swampland".  This 
is  due  to  several  factors.  First,  the  problems  of  formal 
semantics  are  simply  more  difficult  than  the  problems  of 
language  recognition.  Second,  the  rapid  development  of 
fundamentally  new  language  constructs  and  hardware  features 
has  made  the  specification  of  a  set  of  formal  semantic 
primitives  even  more  difficult.  Finally,  and  basicly 
because  of  the  previous  two  points,  there  seems  to  be  a 
general  disagreement  as  to  what  constitutes  an  appropriate 
set  of  formal  semantic  primitives.  Nevertheless,  progress 
continues  to  be  made,  as  witnessed  by  the  substantial 


amount  of  published  material  on  the  subject  (see 
Bibl ioqraphy) .  Appendix  D  describes  our  initial  attempt  at 
utilizing  LIS  in  a  way  that  will  contribute  to  the 
advancement  of  formal  semantic  systems.  We  see  two  areas  of 
future  development  that  are  appropriate  to  the  approach 
that  we  have  taken.  First,  the  implementation  of  a  Base 
Language  interpreter  would  provide  an  experimental 
environment  in  which  the  Base  Language  primitives  could  be 
challenged  and  modified,  the  criteria  being  their  ability 
to  adequately  and  conveniently  represent  higher  level 
language  constructs. 

Second,  our  work  on  pl6535  and  its  Base  Language 
translator  has  convinced  us  that  PL/I  is  a  most 
unattractive  language  for  specifying  formal  semantics. 
Therefore,  an  appropriate  next  step  would  be  the 
utilization  of  LIS  In  developing  a  language  well  suited  to 
specifying  the  Base  Language  interpretation  of  higher  level 
language  constructs. 


IV. F. 2 


It  has  been  widely  accepted  on  intuitive  grounds  that 
there  exi«5t  significant  variations  in  the  complexities  of 
popular  programming  languages.  Few  .would  argue,  for 
example,  that  the  structure  of  the  Basic  language  is  rather 
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simple,  while  that  of  PL/I  is  relatively  more  formidable. 
We  conveniently  express  this  notion  by  saying  that  PL/I  is 
a  more  complex  language  than  Basic.  But  what  does 
complexity  mean?  Can  it  be  measured?  Recent  investigations 
into  complexity  measures  (Don  72,  Hag  70)  are  beginning  to 
uncover  some  of  the  issues,  but  as  yet,  no  satisfactory 
measures  have  been  proposed.  With  respect  to  a  particular 
language,  possible  suggestions  for  complexity  measures 
include  the  difficulty  in  recognizing  the  language's 

constructs  (time  of  parse),  the  number  of  steps  in  the 
derivations  of  the  constructs,  the  size  of  the  parse  tree, 
and  the  space  requirements  of  the  parser  for  the  language. 
On  first  examination,  the  above  issues  would  seem  to  have 
little  relevance  to  the  average  programmer.  However, 

linguistic  constructs  that  are  complex  in  a  formal  sense 
(length  of  derivation,  time  of  parse,  etc.)  are  also 

complex  in  a  human  sense  -  they  take  s  long  time  to  write, 
to  understand,  and  to  debug.  Thus,  complexity  is 

significant  both  from  a  grammar-theoretic  viewpoint  and  from 
a  practical  viewpoint.  Furthermore,  to  the  extent  that 

measures  of  complexity  can  be  aopropriately  defined,  they 

may  be  utilized  in  restructuring  languages  to  reduce 
complexity,  thereby  making  them  more  palatable. 
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Although  our  experience  with  LIS  is  yet  too  limited  to 
enable  us  to  define  precisely  a  formal  set  of  complexity 
measures,  we  have  nevertheless  witnessed  significant 
variations  ir.  ,,complexltyu  among  the  languages  to  which  LIS 
has  been  applied.  We  briefly  consider  two  of  these,  and 
in  so  doing,  suggest  possibilities  for  future  developments 
in  the  evolution  of  complexity  measures,  based  on  LR(k) 
systems. 

In  Section  IV. B. 2,  we  discussed  the  efficiency  of  two 
of  the  languages  to  which  LIS  has  been  applied,  FILETRAN 
and  PL/I.  Referring  again  to  that  discussion,  we  note 
that  while  the  sizes  of  the  grammars  and  their  associated 
DPDAs  are  roughly  equivalent,  a  vast  difference  exists  in 
the  rate  at  which  the  two  languages  can  be  recognized  fc  / 
our  CLR(k)  parser.  The  fact  that  FILETRAN  can  be  parsed  at 
145,000  tokens/minute,  while  PL/I  is  parsed  at  the 
relatively  slower  rate  of  90,000  tokens/minute  indicates 
that  PL/I  presents  a  more  difficult  recognition  problem.  We 
shall  accept  one  of  the  above  suggestions  and  associate 
difficulty  of  recognition  with  grammar  complexity.  We  are, 
therefore,  interested  in  a  predictor  of  complexity  based  on 
our  CLR(k)  strategy.  One  predictor  that  we  have  observed 
Is  the  time  taken  to  compute  the  CLR(k)  parser  of  a  grammar. 
Relatively  speaking,  the  longer  the  time  taken  to  compute 


the  parser,  the  longer  will  bt-  the  time  required  to  parse 
the  constructs  of  the  language.  In  the  present  case,  we 
note  that  the  parser  for  PL/I  took  almost  twice  as  long  to 
compute  as  did  the  parser  for  FILETRAN.  Another  predictor 
that  we  have  observed  is  the  number  of  look-ahead  states  in 
the  resulting  parser.  Again,  relatively  speaking,  the  more 
look-ahead  states,  the  longer  the  required  time  to 
recognize.  This  is  reasonable,  since  the  need  for 
look-ahead  originates  from  a  local  ambiguity,  a  basic  form 
of  complexity.  In  our  present  case,  the  parser  for  PL/I 
contained  60  iook-ahead  states,  while  the  parser  for 
FILETRAN  contained  only  23  look-ahead  states.  The 
look-ahead  states  of  the  PL/I  parser  are  largely  due  to  the 
ubiquitous  <expression>. 

I V . F . 3  M1rrfiPro9rammad  LIS  Processor  Control 

The  space  and  time  efficiency  of  our  CLR(k)  parsers 
have  been  amply  demonstrated  in  Section  IV. B.  Furthermore, 
owing  to  the  simplicity  of  the  CLR(k)  parser  control,  it  is 
entirely  feasible  to  implement  this  control  directly  in 
hardware,  resulting  in  even  greater  parse  time  efficiency. 
Given  the  trends  in  hardware  technology,  it  is  appropriate 
to  consider  implementing  this  control  by  microprogramming. 
Whether  this  effort  is  undertaken  on  a  particular  computer 
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is  dependent  on  such  factors  as  the  increase  in  efficiency 
versus  cost,  and  the  number  of  language  processors  that 
would  benefit  from  the  increased  efficiency. 

In  a  paper  for  a  graduate  computer  science  course  at 
MIT  (Alt  72a),  the  author  Investigated  some  of  the  issues 
involved  in  a  firmware  implementation  of  CLR(k)  parser 
control.  The  actual  representation  of  the  control  and  the 
control  primitives  was  in  the  form  of  a  computation  schema 
(Den  70).  Transformations  can  be  performed  on  the  schema 
to  yield  a  data  flow  structure  and  a  control  structure, 
which  together  constitute  an  asynchronous  modular  hardware 
representation  of  CLR(k)  processor  control. 
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IV.G  Conclus Ion 


Our  objective  in  this  thesis  was  to  develop  a  lanquage 
implementation  system  satisfying  the  criteria  established  in 
Chapter  I,  and  to  utilize  this  system  in  the  development  of 
a  wide  variety  of  artificial  languages  and  their  associated 
processors.  We  feel  that  we  have  been  successful  in  this 
regard.  We  have  shown  the  Language  Implementation  System  to 
be  an  efficient,  reliable,  and  flexible  language  development 
and  implementation  support  system.  We  have  demonstrated 
the  system's  applicability,  not  only  to  traditional 
languages  and  their  processors,  but  also  to  problem  oriented 
languages  and  interactive  languages  for  management 
information/decision  systems. 

It  is  the  author's  conviction  that  the  demand  for 
special  purpose  -  end  user  computational  systems  and 
interactive  decision  support  systems  will  accelerate 
rapidly  over  the  coming  years.  Furthermore,  it  is  also 
believed  that  the  successful  development  and  evolution  of 
such  systems  will  depend  critically  on  the  appropriateness 
of  the  supporting  language  facilities.  We  are,  therefore, 
convinced  that  systems  such  as  the  Language  Implementation 
System  will  come  to  play  a  major  role  in  the  expansion  of 
the  domain  to  which  computation  can  be  successfully  applied. 
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Appendix  A 


LIS  User  Reference  Manual 

A.i  Introduction 

In  this  appendix,  we  present  the  LIS  User  Reference 
Manual.  This  Manual  is  an  abridged  form  of  a  Honeywell 
publication  of  the  same  title  (Alt  72b).  Its  purpose  is  to 
describe  the  way  in  which  the  language  designer/implementer 
utilizes  the  Language  Implementation  System  in  developing 
artificial  languages  and  their  associated  processors.  The 
Manual  is  intended  to  be  self-contained,  and  therefore 
includes  certain  discussions  from  Chapters  I  -  IV  that  are 
relevant  to  the  utilization  of  LIS. 
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The  fundamental  structure  of  The  Language 
Implementation  System  is  indicated  in  Figure  A.I.  Language 
Development  resolves  into  two  interacting  phases,  Processor 
Generation  and  Processor  Execution* 


Initial  Language 
Definition 


Language/Processor  Stability 


A.2.J  Processor  Generation 


Processor  Generation  consists  of  the  execution  of  the 
LIS  Pre-Processor  and  the  LIS  CLR(k)  Generator  for  purposes 
of  computing  the  following  functional  results  from  the 
submitted  LIS  Language  Definition* 

a.  The  parsing  tables  (DPDAs)  which  are  used  to 
"drive"  LIS  Processor  Control  in  parsing  legal 
Input  Text  of  the  language. 

b.  A  PL/I  procedure  which  represents  the  semantic 
interpretation  to  be  associated  with  the 
language-' s  syntactic  constructs. 


LlS_LanQuaoe  Definition 

A  precise  specification  of  the  format  oi  an  LIS 
Language  Definition  may  be  found  in  Sections  A. 3  and  A. 4. 
Por  our  present  purposes,  however,  we  may  consider  an  LIS 
Language  Definition  to  consist  of« 

a.  A  Backus  Naur  Form  specification  of  the 
syntax  of  the  language  being  defined. 

b.  A  PL/I  specification  of  the  semantics  of  the 

language,  expressed  in-line  with  the  BNF 
specification  on  a  per-BNF  rule  basis.  In 
specifying  the  semantics  of  a  particular 
syntactic  construct,  the  language 

designer/implementer  uses  PL/I  to  define  the 
actions  that  his  language  processor  Is  to 
perform  when  the  corresponding  syntactic 
construct  is  recognized. 


LLS-Era-Processor 

The  LIS  Pre-Processor  performs  the  following  functions* 
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a.  The  LIS  Pre-Processor  computes 
Source  Segment  from  the  Language 


b. 


the  Semantic 
Definition. 


The  LIS  Pre-Processor  performs  various 
validity  checks  and  analysis  procedures  on  the 
submitted  grammar,  delivering  diagnostics  for 
those  checks  and  analysis  procedures  that  the 
grammar  falls  to  satisfy.  Certain  of  the 
checks  and  procedures  are  of  a  warning  nature 
onlvl  failing  to  satisfy  these  will  not 
prevent  the  activation  of  the  LIS  CLR(k) 
Generator.  Others  are  of  a  fatal  nature,  and 
must  be  satisfied  if  the  LIS  CLR(k)  Generator 
is  to  be  activated.  The  checks  and  analysis 
procedures  that  have  been  implemented  on  LIS 
are  discussed  from  the  language  design 
viewpoint  in  Section  A. 3. 


LIS  CLRCk)  Generator 

If  the  LIS  Pre-Processor  encounters  no  fatal  errors  in 
the  LIS  Language  Definition,  control  is  automatically  passed 
to  the  LIS  CLR(k)  Generator.  This  phase  of  the  system 
attempts  to  compute  a  CLR(k)  parser  for  k  less  than  or  equal 
to  a  certain  Internally  set  value  (currently  set  at  3).  The 
CLR(k)  (Comprehensive  Left  to  Right,  looking  ahead  a  maximum 
of  L  symbols)  grammars  constitute  a  large  subset  of  the 
LRU)  grammars,  which  in  turn  possess  the  following 
characteristics* 

a.  The  LRU)  condition  generates  exactly  the 
deterministic  context-free  grammars. 

b.  The  LRU)  grammars  represent  the  largest  class 
of  grammars  known  to  be  parsable  in  linear 
time  (proportional  to  the  length  of  the  input 
text)  during  a  single  left  to  right  scan. 

c.  A  gramnar  satisfying  the  LRU)  condition  is 
unambiguous . 
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that 


the 


Intuitively,  the  LR(  k)  condition  implies 
Identity  of  a  particular  syntactic  construct  may  be 
ascertained  by  looking  indefinitely  far  to  the  left  and  at 
most  k  symbols  to  the  right  of  the  current  position  in  the 
parse  (symbols  meaning  characters  or  lexical  constructs, 
depending  on  whether  a  lexical  or  primary  grammar  is  being 
defined,  respectively).  This  is  an  extremely  comprehensive 
condition,  and  covers  virtually  all  artificial  languages 
that  are  likely  to  be  of  “practical"  interest. 

In  attempting  to  compute  the  parsers,  the  LIS  CLR(k) 
Generator  delivers  diagnostics  for  those  areas  of  the 
language  that  do  not  satisfy  the  CLR(k)  condition.  These 
diagnostics  include  sufficient  information  on  the 
language's  local  ambiguities  to  enable  the  language 
designer/implementer  to  modify  the  syntax  of  his  language 
in  order  to  make  it  CLR(k).  Assuming  that  the  grammar  is 
CLR(k),  the  functional  output  of  the  LIS  CLR(k)  Generator  is 
a  segment  containing  one  or  two  DPDAs ,  depending  on  whether 
a  lexical  parser,  a  primary  parser,  or  both,  are  computed. 
The  DPDAs,  in  combination  with  LIS  Processor  Control, 
constitute  the  parsers  for  the  processor  of  the  language 
being  defined. 
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A  .2.2  ErQCfiSfiQJC-EXflCmiQQ 


A  processor  for  the  artificial  language  specified  by 
the  LIS  Language  Definition  is  synthesized  by  combining  the 
DPDAs  and  the  Semantic  Object  Segment  with  LIS  Processor 
Control . 

LIS  Processor  Control  coordinates  the  overall 
language  processing  activity.  In  parsing  the  Input  Text, 
it  is  "driven"  by  the  DPDAs,  and  upon  recognition  of  a 
particular  syntactic  construct,  it  activates  the  semantics 
associated  with  that  construct.  It  is  the  responsibility  of 
the  activated  semantics  subsequently  to  return  control  to 
LIS  Processor  Control  so  that  language  processing  may 
continue . 

The  semantics  can  access  the  Input  Text  directly,  and 
the  normal  situation  Is  for  Processor  Control  to  coordinate 
these  accesses  by  directing  the  semantics  to  specific  text 
such  as  Identifiers,  key-symbols,  etc.  As  indicated  In 
Figure  A. I,  there  is  no  explicit  output  from  Processor 
Execution.  It  Ir  therefore  the  responsibility  of  the 
semantics  to  manage  its  own  output,  as  well  as  its  alternate 
input  files,  temporary  files,  symbol  tables,  etc. 

As  a  matter  of  processing  efficiency,  we  note  that  it 
is  possible  to  combine  the  Semantic  Source  Segment  with  LIS 
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Processor  Control  into  a  single  segment,  which  is  then 
compiled  by  the  PL/I  compiler.  The  advantage  of  this 
combination  is  that  activations  of  BNF  rule  semantics  may 
be  effected  by  «goto«s  rather  than  “call«s.  In  addition,  an 
option  is  planned  for  LIS  that  will  permit  the  DPDAs  to  be 
produced  in  the  form  of  initialized  PL/I  declarations,  which 
will  al,o  be  compiled  with  LIS  Processor  Control.  The 
combined  result  of  these  optimizations  is  that  it  will  be 
possible  for  the  three  functional  units  comprising  Processor 
Execution  to  be  combined  into  a  single  PL/I  procedure, 
resulting  in  significant  improvements  in  space  and  time 


efficiency. 


A. 3  The  Definition  of  Artificial  Languages  - 

Spec  if Ication  of .Syntax 

In  this  section,  we  discuss  the  way  in  which  the 
language  designer/implementer  formulates  the  specification 
of  the  syntax  of  his  language  for  processing  by  LIS.  In 
Section  A.4,  we  address  the  problem  of  semantic 
specif icatiion.  In  following  these  discussions,  the  reader 
may  find  it  useful  to  refer  to  the  example  of  a  simple 
language  definition  in  Section  A. 7. 

LIS  provides  the  capability  to  compute  both  lexical  and 
primary  parsers  from  the  appropriate  specifications.  For 
the  most  part,  the  structure  and  format  of  these 
specifications  are  the  same.  The  features  of  the 
specifications  that  are  grammar  dependent  are  discussed  in 
the  appropriate  sections  belowi  the  initial  discussion  will 
focus  on  the  features  that  the  specifications  have  in 
common. 

A. 3.1  Syntax  Specification  -  General 

The  Language  Implementation  System  accepts  the  syntax 
of  artificial  languages  specified  in  free  format  Backus  Naur 
Form  (BNF).  The  purpose  of  the  present  discussion  is  to 
describe  the  structure  and  format  of  LIS  acceptable  BNF. 


139  - 


BNF  Spftfilf  im* 1on  Qn-LIS 

On  LIS,  BNF  specifications  are  structured  as  follows* 

a.  A  BNF  specification  consists  of  a  collection 
of  BNF  rules.  With  one  minor  exception 
(discussed  in  Section  A. 3. 3),  the  order  of  the 
rules  is  irrelevant!  the  specification  is 
non-procedural . 

b.  A  BNF  rule  must  start  on  a  new  line,  may 
extend  over  several  lines,  and  is  terminated 
with  an  exclamation  point  (**!"). 

c.  A  BNF  rule  consists  of  a  <left  part>  and  a 
<right  part>,  separated  by  the  string  U***M» 


Thus,  a  BNF  specification  is  an  unordered  set  of  BNF 

rules  of  the  following  formi 

<left  part>  n*  <right  parts  ! 

BNF  Rulft  -  <*»ft  oart> 

The  <left  parts  of  a  BNF  rule  identifies  the 
non-terminal  that  is  defined  in  that  rule.  Thus,  the  format 
of  the  <left  parts  is  identical  to  the  format  of  BNF 
non-terminals,  which  is  as  follows* 

«<*•  character-string^s* 

Character-string  is  restricted  so  that* 

a.  Character-string  must  not  exceed  70  characters 
in  length,  including  blank,  tab,  new— line,  and 
new-page  characters. 

b.  Character-string  may  Qfll  include  any  of  the 
characters*  "<u,  *s*,  ••{«,  "!*,  or  the  string 
«•  *  *»“  . 
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RNF  Rule  -  <r 1 aht  pari* 

The  <r ight  part>  of  a  BNF  rule  consists  of  one  or  more 
flitpmuHye  definitions  of  the  <left  part>,  separated  bv  the 
character*  *!*.  When  referring  to  a  particular  alternative 
definition  of  the  non-terminal  <left  part>  in  a 
multi-alternative  BNF  rule,  one  identifies  the  particular 
alternative  in  question  or  equivalently,  to  the  product Ian 
formed  by  constructing  a  single  alternative  BNF  rule  from 
the  <left  part>  and  that  alternative.  Alternatives  exist 
primarily  for  convenient  syntactic  notation.  However,  when 
associating  semantic  interpretation  with  a  particular  BNF 
rule,  account  must  be  taken  of  the  alternatives,  and  this  is 
described  in  Section  A. 4. 


Each  alternative  of  a  <right  part>  consists  of  at  least 
one  symbol.  A  symbol  may  be  either  a  non-terminal  or  a 
terminal  string  of  ASCII  characters,  the  terminal  string 


being  subject  to  the  following  conditions* 

a.  The  terminal  string  starts  with  the  first 
ASCII  character  following  the  last  symbol  of 
the  alternative  (or  following  or  «**=M  if 
it  is  the  first  symbol  of  the  alternative) 
that  is  neither  a  blank,  nor  a  tab,  nor  a 
new-line,  nor  a  new-page  character.  Since 
blank,  tab,  new-line  and  new-page  characters 
are  not  normally  considered  part  of  a  terminal 
string,  they  must  be  escaped  if  they  are  to  be 
significant  (escaping  conventions  are 
described  in  Section  A.3.4). 


b.  The  terminal  string  may  not  contain  any  of  the 

mr  •  »  .  ....  .  a.  u  ,  1*  _ _  A.  L.  ^  ^  ^  ■■  4  M 


characters  *•<*,  M>- 


“!M,  or  the  string 
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"*»=«,  unless  they  are  escaped. 

A. 3. 2  Syntax  Spec  1 f 1  cation  -  Primary -Grammar 

We  use  the  term,  primary  grammar,  to  refer  to  that 
subset  of  a  particular  language's  complete  syntax 
specification  that  excludes  the  specification  of  the  lexica.1, 
non-terminals  of  the  language. 

<pr 1 marv  non  terminals 

The  non-terminal,  <primary«non, .terminal  >,  i  as  been 
reserved  on  LIS  for  purposes  of  identifying  the  goal 
symbol  of  the  primary  grammar.  By  defining 
<primary_non_terminal>,  the  user  identifies  the  ultimate 
syntactic  objective  of  his  language.  Except  that  it  milfii  be 
defined  in  order  for  LIS  to  compute  a  parser  for  the  primary 
grammar,  no  fundamental  limitations  are  placed  on  the 
definition  of  <primary_non_terminal>. 

Saac  igg-Cony  entlQQs 

The  convention  on  LIS  is  that  artificial  languages 
are  free  format.  The  non-terminal,  <non_lexical>  has  been 
reserved  so  as  to  permit  the  specification  of  those 
characters  that  are  to  serve  as  explicit  delimiters  and 
spacing  characters  within  the  defined  language.  The  precise 
specification  of  <non_lexical>  is  given  in  Section  A. 3. 4. 
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The  interpretation  of  <non_lexical>  is  that  an  indefinite 
number  of  <non_lexical>  characters  may  appear  between  those 
syntactic  constructs  of  the  defined  language  that  correspond 
to  the  symbols  in  the  alternatives  of  the  BNF  specification 
of  the  language .  Equivalently,  the  interpretation  to  be 
associated  with  the  delimiting  of  symbols  in  the 
alternatives  of  the  primary  grammar,  is  that  an  indefinite 
number  of  <non_lexical >  characters  may  appear  in  the 
language  at  points  corresponding  to  the  primary  grammar 
symbol  delimiters.  By  an  indefinite  number  of  <non_lexical> 
characters,  we  mean  zero  or  more,  or  one  or  more,  depending 
on  whether  at  least  one  <non_lexica 1 >  character  is  required 
in  order  to  avoid  conflict  with  a  single  string  of 
characters  that  may  be  recognized  as  a  particular  lexical 
construct* 

For  example,  consider  the  following  BNF  rule* 
<go_to_phrase>  **  =  go  to  <  identifier  ! 

Assuming  that  <identifier>  is  defined  as  usual  (e.g.  as 
in  PL/I),  we  see  that  "gotoa"  would  be  recognized  as  a 

single  <ident if ier>,  "goto  a"  would  be  recognized  as  a 
sequence  of  two  <identif ier>s,  and  that  only  some  form  of 
"go  to  a",  in  which  at  least  one  <non_lexical>  character 
exists  between  each  symbol,  would  be  recognized  as  a 

<go_to_phrase>.  On  the  other  hand,  consider  the  following 
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example  (<identif ier>  and  <integer>  are  used  in  the  usual 
non-terminal  seme)* 

<subscripted_identif ier>  **-  <identifier>  (  <integer>  )  ! 

In  this  case,  there  is  no  possible  conflict  between 
<identifier>  and  between  "I*1  and  <integer>,  or  between 
<integer>  and  **)",  so  that  here  an  indefinite  number  implies 
zero  or  more.  In  the  above  rule,  and  ")"  are  impl icl t. 
delimiters!  their  use  as  delimiters  is  based  on  context  and 
extracted  from  the  grammar,  as  opposed  to  being  stated 
explicitly,  as  with  <non_lexical>. 

In  general,  therefore,  an  indefinite  number  of 
<non_lexical>  characters  is  interpreted  to  mean  a  number 
greater  than  or  equal  to  that  minimum  number  required  to 
avoid  improper  recognition  due  to  symbol  conflicts  with 
lexical  constructs. 

Although  we  have  discussed  the  use  of  <non_lexical>  as 
it  applies  to  the  primary  grammar,  it  is  the  convention  on 
LIS  to  define  <non_lexlcal >  with  the  specification  of  the 
lexical  grammar  in  those  cases  in  which  the  two  grammars  are 
defined  in  separate  LIS  Language  Definitions. 
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A.  3. 3  syntax  Spec  if icatlon  -  Lexical  Grammar 


The  lexical  constructs  of  a  language  are  those  basic 
structural  elements  that  are  used  in  writing  text  in  the 
language.  These  include  the  key-symbols,  implicit 
delimiters  (e.g.  ")",  "+*\  and  the  lexical 
non-terminals  (e.g.  <identifier>  and  <integer>,  in  the  usual 
sense)  that  are  built  up  from  the  terminal  characters  of 
the  language,  but  whose  substructure  is  of  no  fundamental 
interest,  either  syntactically  or  semantically.  Of  these 
three  classes,  the  key-symbols  and  implicit  delimiters  are 
derived  from  the  primary  grammar,  and  it  is  the  function  of 
the  lexical  grammar  to  define  the  structure  of  the  lexical 
non-terminals . 

<i»xlrwl  non  termlnal> 

The  non-terminal,  <lexical_/ion_terminal>,  has  been 
reserved  on  LIS  for  purposes  of  identifying  the  goal 
symbol(s)  of  the  lexical  grammar,  i.e.  for  identifying  the 
lexical  non-terminals  of  the  language. 

Definitions  of  <lexical_non_terminal>  are  restricted  to 
the  extent  that  its  productions  may  consist  only  of  single 
non-terminal  symbols. 

LIS  may  be  executed  for  purposes  of  computing  parsers 
for  both  primary  and  lexical  grammars,  or  for  either 
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separately,  and  In  each  of  these  cases 
<lexical_non_terminal>  has  the  following  meaning* 

a.  Compute  primary  and_la*l£al  aarsfiiLS 

In  this  case,  <lexical_non_terminal>  is 
defined  ^o  as  to  establish  the  "division  of 
labor*1  between  the  primary  parser  and  the 
lexical  parser.  Thus,  the  lexical  parser 
builds  up  the  <lexical_non_terminal>s  from  the 
terminal  characters,  and  the  primary  parser 
accepts  the  <lexical_non_terminal>s, 

key-symbols,  and  implicit  delimiters  as  its 
basic  elements  in  parsing  the  language  being 
defined  (as  specified  by 

<pr imary_non_terminal> ) . 

b.  Comouli -only  .tha..fl£ltaary  parser 

This  case  is  similar  to  case  a,  except  that 
it  is  not  necessary  for  LIS  to  have  any 
immediate  knowledge  as  to  the  structure  of  the 
<lexical_non_terminal>s .  Thus 

< lex ical_non_terminal>  is  defined  so  as  to 
Inform  LIS  of  the  basic  elements  that  the 
primary  parser  will  receive  from  the  lexical 
parser. 

c.  Compute  only  the  lexical  parser 

In  this  case,  the  definition  of 
<lexical_non„terminal>  serves  to  indicate  to 
LIS  that  those  non-terminals  defined  to  be 
<lexical_non_terminal>s  represent  the  goal 
symbols  of  the  lexical  grammar. 


It  may  turn  out  to  be  convenient  to  call  upon  LIS 
twice,  once  to  compute  a  parser  for  the  primary  grammar  and 
again  to  compute  a  parser  for  the  lexical  grammar.  Since 
the  lexical  grammar  generally  stabilizes  long  before  the 
primary  grammar,  this  activation  sequence  is  efficient  to 
the  extent  that  changes  may  be  introduced  into  the  two 
grammars  independently.  In  this  situation,  the  definition 
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of  <lexical_non_terminal>  performs  a  communication  function 

between  the  two  computations,  and  it  is  here  that  we  have 

the  only  exception  to  the  previously  stated  rule  that  the 

order  of  the  BNF  rules  is  of  no  significance* 

In  those  cases  in  which  LIS  is  used  to  compute  a 
primary  parser  and  a  lexical  parser,  but  in  SfiPflJCflLfi. 
activations  using  separate  LIS  Language  Definitions, 
the  system  requires  that  the  rules  defining 
<lexical„non_terminal>  be  the  same  in  each  definition 
and  that  they  appear  (in  the  same  order)  as  the  first 
rules  in  each  definition.  However,  if  the  separate 
activations  are  performed  on  the  same  definition 
segment,  then  the  placement  of  the  rules  defining 
<lexical_non_terminal>  is  of  no  significance. 

Sparial  lexical  Encoding 

A  special  lexical  encoding  convention  has  been 
implemented  on  LIS.  Though  implemented  primarily  for 
purposes  of  space  and  time  efficiency  in  the  resulting 
parsers,  the  convention  also  provides  a  convenient  syntactic 
notation.  The  encoding  permits  the  grouping  of  those 
characters  that  are  equivalent  in  their  effect  on  the 
structure  of  the  <lexical_ non_terminal >s,  and  that  also  have 
rnnHnuous  ftqPTT  collating  codes.  The  encoding  may  be 
thought  of  as  an  additional  type  of  symbol  in  the  lexical 
gramnar.  The  format  of  the  encoding  is  as  follows* 

s->f 

In  the  above  format,  s  is  the  starting  character  in  the 
sequence  (the  one  with  the  smaller  numeric  code)  and  I  is 
the  £inal  character  in  the  sequence  (the  one  with  the  larger 
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numeric  code).  The  encoding  is  represented  as  four 
contiguous  characters .  with  no  intervening  blanks,  tabs, 
etc.  An  example  of  the  use  of  the  encoding  is  as  followst 

<sma ll-letter>  >*  =  a->z  ! 


The  convention  established  for  <lexical_non_terminal>s 
is  that  all  characters  not  belonging  to  the  defined 
constructs  serve  to  delimit  those  defined  constructs.  This 
includes,  of  course,  spacing  characters  such  as  blank,  tab, 
etc.  Furthermore,  application  of  this  convention  is 
independent  of  any  particular  spacing  within  the 
alternatives  defining  the  constructs,  so  that,  for  example, 
in  the  following  definition  of  <identifier>,  the  fact  that 
spacing  exists  between  the  symbols  of  the  first  alternative 
does  not  imply  that  spacing  is  permitted  within 
< ident  if ier>s. 

<identifier>  ***  <identifier>  a->z  i 

a->z  ! 

In  this  case,  if  the  resulting  parser  is  in  the  process 
of  recognizing  an  <identif ier>»  then  any  character  not 
satisfying  the  specification,  a->z  (i.e.  any  character  that 
is  not  a  small  letter)  causes  the  recognition  of  the 
<identifier>  to  terminate. 
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A . 3 . 4  Syntax  Soeo  1  f  1  ration  -  Conventions  and .Rft&txlcllQQS 


In  the  following  discussion,  we  describe  the 
conventions  and  restrictions  that  are  implemented  on  LIS 
with  regard  to  syntJX  specifications.  Some  of  these  have 
already  been  discussed,  and  in  such  cases,  the  present 
discussion  summarizes  and  extends  the  previous  one.  Most  of 
the  restrictions  that  have  been  implemented  have  to  do  with 
characteristics  of  well  structured  and  logically  complete 
language  specifications,  in  general,  and  are  thus  quite 
independent  of  LIS. 

All  error  messages  delivered  by  LIS  are  directed  to  the 
user  input/output  stream,  normally  the  terminal.  Many  of 
the  procedures  that  verify  the  following  conventions  and 
restrictions  include  in  their  messages,  an  identification 
of  the  BNF  rule  number(R)  and  alternative  number(A)  in  the 
form,  (R*  A).  In  tracking  down  the  rule  in  question,  the 
user  may,  of  course,  manually  count  the  rules  in  the  LIS 
Language  Definition.  Perhaps  more  simply,  he  may  refer  to 
the  Semantic  Source  Segment  computed  during  the  activation, 
in  which  each  rule  has  been  placed  in  a  PL/I  comment  and 
preceded  by  the  label,  bnf_rule(R),  where  R  is  the  rule 
number  (see  example  in  Section  A. 7). 


<C£lmary_Qfla_Laimlaal  ?. 


<pr imary_non_terminal >  is  reserved  on  LIS,  and  its 
definition  serves  to  identify  the  goal  symbol  of  the  primary 
grammar.  Restrictions  on  its  use  are  as  follows* 

a.  If  LIS  is  activated  for  purposes  of  computing 
both  a  primary  and  a  lexical  parser,  or  if  it 
is  activated  to  compute  only  a  primary 
parser,  then  <pr imary_non_terminal>  must  be 
defined . 

b.  <pr imary_non_terminal>  must  not  be  refe>  iced 
as  a  symbol  in  an  alternative. 


There  is  no  fundamental  restriction  on  the  definition 
of  <primary_non_terminal >. 


^lexical  non  terminal > 

<Jexical_non_terminal>  is  reserved  on  LIS,  and  its 
definition  serves  to  identify  the  goal  symbol  of  the  lexical 
grammar.  Restrictions  on  its  use  are  as  follows* 

a.  If  LIS  is  activated  for  purposes  of  computing 
both  a  primary  and  a  lexical  parser,  or  if  it 
is  activated  to  compute  only  a  lexical  parser, 
or  if  it  is  activated  to  compute  only  a 
primary  parser  and  the  primary  grammar  has 
references  to  <lexical_non_terminal>,  then 
<lexical_non_terminal>  must  be  defined. 

b.  <lexical_non_terminal>  may  only  be  defined  in 
BNF  rules  whose  alternatives  consist  of  single 
non-terminal  symbols. 

c.  <lexical_non_terminal>  must  not  be  referenced 
as  a  symbol  in  an  alternative. 

d.  In  those  cases  in  which  LIS  Is  used  to  compute 
both  a  primary  parser  and  a  lexical  parser, 
but  in  separate  activations,  using  separate 
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LIS  Language  Definitions,  the  system  requires 
that  the  rules  defining  <lexical_non_terminal> 
be  the  same  in  each  definition,  and  that  they 
appear  (in  the  same  order)  as  the  first  rules 
in  each  definition. 


<non  lexical* 

<non_lexical>  is  reserved  on  LIS,  and  its  definition 
serves  to  identify  those  characters  that  are  to  be  the 
explicit  delimiters  and  spacing  characters  of  the  primary 
grammar.  Restrictions  on  its  use  are  as  follows* 

a.  <non  lexical>  should  be  defined  with  the 
lexical  grammar  in  those  cases  in  which  the 
primary  and  lexical  grammars  are  in  separate 
LIS  Language  Definitions. 

b.  <non  lexical>  may  only  be  defined  to  consist 
of  a  maximum  of  eight  single  characters,  i.e. 
a  maximum  of  eight  alternatives,  each 
consisting  of  a  single  symbol,  a  character. 

c.  <norv-lexical>  must  QQJt  be  referenced. 

d.  If  not  defined,  <non_lexical>  assumes  the 
default  values  of  blank,  tab,  new— line,  a>*d 
new-page  characters. 


<any  strings 

LIS  has  reserved  the  non— terminal ,  <any_string>  so  as 
to  admit  the  convenient  and  efficient  representation  of 
language  constructs  such  as  quoted  strings.  The  use  of 
<any_string>  is  restricted  so  that* 

a.  <any_string>  must  oal  be  defined. 

b.  An  alternative  in  which  <any_string>  is 
referenced  roust  consist  of  exactly  three 
symbols,  the  first  and  third  of  which  are 


terminal  character  strings,  and  the  second  of 
which  is  <any„string>. 

The  interpretation  to  be  associated  with  the  use  of 
<any_string>  is  that  the  resulting  parser,  upon  detecting 
the  first  terminal  character  string,  will  accept  all 
characters  up  to  an  occurrence  o;t  the  last  terminal  string 
as  belonging  to  <any_string>  for  that  construct. 

An  example  of  the  use  of  <any_string>  is  the  following 

definition  of  <quoted_string> * 

<quoted_string>  *«*=  "<any_str ing^  i 

Escaping  ConvenLLflQS 

An  escaping  convention  has  been  established  on  LIS  so 
as  to  permit  an  alternate  representation  of  ASCII 
characters.  The  escaping  convention  may  be  applied  to  any 
character  in  any  rule,  although  when  applied  to  character 
strings  that  are  key  to  the  LIS  version  of  BNF  <"<*♦  *>*. 

11 1 1  =" ,  «•**,  and  •!*)  the  convention  is  that  these  symbols 

lose  their  key  status.  The  escaping  character  is  the 
apostrophe  (*'"),  and  it  may  be  followed  either  by  the  ASCII 
graphic  representation  of  the  character  (e.g.  '$)  or  by  the 
Mu 1  tics  ASCII  code  of  the  character  (e.g.  '044).  To  escape 
the  apostrophe,  a  double  apostrophe  is  used* 


Rgf granges  and  Dei Inns  of  Non-Terminals 

All  non-terminals  that  are  referenced  must  also  be 
defined,  with  the  following  exceptions* 

a.  <ar.y_str ing>  must  not  be  defined. 

b.  If  LIS  is  activated  for  purposes  of  computing 
only  a  primary  parser,  then  those 
non-terminals  that  are  <lexical_non_terminal>s 
need  not  (although  they  may)  be  defined. 

Uo-needfid  Productions 

So  as  to  aid  the  LIS  user  in  debugging  the  syntax 
specification  of  his  language,  the  system  has  tho  capability 
to  detect  and  report  rules  and/or  alternatives  that  are  not 
needed,  i.e.,  that  do  not  contribute  ;  onstructs  of  the 
language.  Un-needed  productions  are  identified  as  a  result 
of  either  structural  repetition  or  structural  gaps  within 
the  grammar,  and  often  these  are  traced  to  simple  spelling 
errors  in  the  specification.  Since  it  is  sometimes  the  case 
that  the  user  expects  un-needed  productions  in  his  language, 
the  detection  of  such  productions  will  Qfli  prevent  the 
system  from  attempting  to  compute  the  appropriate  parsers. 
Such  a  case  of  expected  un-needed  productions  may  occur,  for 
example,  when  a  language  is  being  developed  in  parts,  and 
those  parts  contain  definitions  in  common.  When  certain  of 
these  parts  are  subsequently  merged  to  synthesize  a  more 
comprehensive  subset  of  the  language,  un-needed  productions 
may  result  from  duplicate  definitions. 
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In  the  following  discussion*  we  present  the  condition 
under  which  un-needed  rules  and/or  alternatives  may  arise. 

a .  Duplicate  Productions 

In  those  cases  in  which  duplicate  productions 
exist,  only  the  first  is  used  in  the  definition* 
and  the  remaining  productions  are  identified  as 
not  contributing  to  the  language. 

b.  Productions  of  the  Form »  <A>  ttm  <A>  Productions 
of  this  form  clearly  do  not  contribute  new 
constructs  to  the  language,  in  fact*  they  give 
rise  to  syntactic  ambiguity.  All  occurrences  of 
such  productions  are  indicated  as  not  contributing 
to  the  language. 

c.  Non-Terminals  not  Referenced  bv  the  .Grammar 

In  this  case,  we  determine  which  non-terminals  are 
referenced  by  the  grammar  as  follows* 

i.  The  set  of  non_terminals  referenced 
grammar  is  initialized  with 
appropriate  grammar  goal 
(<primary_non_terminal> 

<lexical_non_terminal>) . 

ii.  The  closure  of  the  set  is  obtained  by 
recurring  on  the  condition  that  any 
non-terminals  referenced  in  definitions  of 
non-terminals  that  are  referenced  by  the 
grammar  are  also  referenced  by  the 
grammar. 

Any  non-terminals  not  identified  as  being 
referenced  by  the  grammar  are  un-needed,  and  all 
productions  which  define  or  reference  these 
non-terminals  are  identified  as  not  contributing 
to  the  language. 

d.  Non-terminals  not  occurring  in  anv  Derivation 

of  la*!  of  tbe-Lanauage 

Satisfying  the  referenced  condition  in  point  c 
above  insures  that  the  non-terminals  in  question 
are  referenced  by  the  grammar.  This  does  not 
indicate,  however,  whether  the  non-terminals 
actually  participate  in  the  derivation  of  text  of 
the  language.  In  other  words,  point  c  identifies 
those  non-terminals  that  may  be  found  in 


by  the 
the 
symbol 
or 


sentential  forms  of  the  language  whereas  our 
present  concern  is  with  those  non-terminals  that 
may  occur  in  the  derivation  of  IsrmlQal  sentential 
forms.  Those  non-terminals  that  do  not  occur  in 
the  derivation  of  some  terminal  sentential  form 
are  un-needed,  and  productions  defining  or 
refa/encing  these  non-terminals  are  identified  as 
not  contributing  to  the  language. 


Primary  and  Lexlcnl  NQn-Te rmlnal  Conflicts 

It  is  an  implementation  restriction  on  LIS  that  the 
same  non-terminals  may  not  be  explicitly  referenced  by  both 
the  primary  and  lexical  grammars,  with  the  following 
exceptions* 

a.  <any_string>  may  be  explicitly  referenced  by 
both  grammars. 

b.  Those  non-terminals  that  constitute  the 
<lexical_non_termlnal>s  may  be  explicitly 
referenced  by  both  grammars. 


Ihe  CLRU1  Condition 

Given  that  each  of  the  relevant  conditions  above  is 
satisfied,  LIS  will  attempt  to  compute  a  parser  from  the 
appropriate  syntax  specification.  The  successful  completion 
of  this  computation  is  dependent  upon  the  language  in 
question  being  CLR(k)  for  k  less  than  or  equal  to  three. 
Recall  that  the  CLR(k)  condition  implies  that  the  syntactic 
identity  of  a  particular  construct  may  be  ascertained  by 
looking  indefinitely  far  to  the  left  and  at  most  k  symols  to 
the  right  of  the  current  position  in  the  parse.  Failure  to 
satisfy  this  condition  implies  the  existence  of  a  local 


syntactic  ambiguity  that  cannot  be  resolved  with  up  to  k 
symbols  of  look-ahead.  The  implementation  has  restricted  k 
to  three  because  of  the  inefficiency  incurred  in  parsing 
languages  requiring  more  than  a  limited  amount  of 
look-ahead,  and  because  k  =  3  is  likely  to  cover  virtually 
all  artificial  languages  of  "practical"  interest. 

Of  course,  in  the  process  of  developing  a  language  on 
LIS,  it  is  likely  that  the  user  will  (hopefully 
inadvertently)  define  certain  constructs  that  are  not 
CLR(k),  k  <  3  and  in  these  cases  the  system  delivers 
diagnostics  describing  the  areas  of  local  ambiguity. 

It  should  be  realized  that  the  failure  of  a  particular 
construct  to  satisfy  the  CLR(k)  condition  is  no  particular 
adverse  reflection  on  the  condition,  since  such  constructs 
represent  areas  of  complexity  that  are  likely  to  be 
difficult  to  handle  under  any  parsing  scheme.  Perhaps  more 
importantly,  they  represent  complexities  in  the  language 
that  users  of  the  language  would  probably  rather  avoid. 
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A. 4  The  Definition  of  Artificial  Languages  - 

SpaclllcallQQ-fll.,SgfliaQ.tl*LS 

In  this  section  we  discuss  the  way  in  which  the 
language  Jesigner/implementer  formulates  the  specification 
of  the  semantics  of  his  language  for  processing  by  LIS. 
Semantic  specification  on  LIS  consists  of  initialization 
semantics  and  BNF  rule  semantics,  all  expressed  in  PL/I  and 
employing  the  full  capability  of  the  language  that  is 
appropriate  at  the  particular  point.  Conclusion  or  wrap-up 
semantics  is  a  special  case  of  rule  semantics,  being 
associated  with  the  last  production  that  is  applied  during 
recognition  of  legal  text  of  the  language.  The  semantic 
activation  sequence  employed  in  LIS  computed  language 
processors  is  for  the  initialization  semantics  to  be 
activated  prior  to  the  start  of  the  parse,  and  thereafter, 
for  conrol  to  be  passed  to  the  semantics  associated  with  a 
particular  BNF  rule  when  a  construct  specified  by  an 
alternative  of  that  rule  is  recognized  in  the  Input  Text. 

With  respect  to  the  specification  of  rule  semantics,  it 
is  clear  that  the  semantics  to  be  performed  is  dependent  on 
the  alternative  being  applied.  Therefore,  provision  has  been 
made  so  that  the  alternative  being  applied  is  identified 
when  control  is  passed  to  the  rule  semantics. 


The  overall  structure  of  an  LIS  Language  Definition  is 
indicated  in  Figure  A. 2.  The  LIS  Pre-Processor  transforms 
the  definition  into  a  Semantic  Source  Segment,  and  in  so 
doing,  establishes  the  semantic  linkages  between  the  rules 
and  their  semantics  that  are  necessary  at  language 
processing  time.  This  transformation  involves  the 
generation  of  text,  which  is  discussed  below  and  which  is 
indicated,  by  example,  in  Section  A. 7. 

A . 4 . 1  Initialization  Semantics 

Initialization  semantics  exists  so  as  to  permit  the 
language  designer/implementer  to  specify  those  semantic 
actions  to  be  performed  by  his  processor  prior  to  the 
initiation  of  the  parsers.  Initialization  semantics  is,  by 
convention,  all  text  prior  to  the  first  BNF  rule. 

It  Is  not  necessary  to  include  a  PL/I  procedure 
statement  at  the  beginning  of  the  initialization  semantics, 
since  this  is  done  automatically  by  LIS  when  computing  the 
Semantic  Source  Segment  from  the  LIS  Language  Definition. 
The  label  on  the  procedure  statement  is  derived  from  the 
segment  name  of  the  language  definition,  so  that  if  the 
segment  is  named  "abc.lis",  the  label  on  the  procedure 
statement  is  "abc*  (for  information  on  LIS  segment  naming 
conventions,  see  Section  A. 6).  Just  prior  to  parsing,  the 


Initialization  Semantics 


BNF  Rule  - 

Rule  Semantics  (?) 

BNF  Rule  - 

Rule  Semantics  (?) 


BNF  Rule  - 

Rule  Semantics  (?) 


BNF  Rule  - 


Rule  Semantics  (?) 


BNF  Rule  - 

Rule  Semantics 


(?) 


LIS  Lanouqq*  Definition  Structure 
Figure  A. 2 
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a  call  to 


initialization  semantics  is  activated  via 
abc_semant icsSabc  made  by  LIS  Processor  Control.  It  is  the 
responsibility  of  the  initialization  semantics  to  return 
control  to  Processor  Control  (via  the  PL/I  return  statement) 
upon  completion  of  its  actions. 


A. 4. 2 


The  semantic  interpretation  to  be  associated  with  a 
particular  BNF  rule  is  understood  to  be  the  complete  (PL/I) 
text  between  that  rule  and  the  next  rule,  or  end  of  segment 
If  the  rule  In  question  is  the  last  rule  of  the  definition. 
If  the  text  consists  of  blank,  tab,  new-line,  and  new-page 
characters  exclusively,  then  it  is  assumed  that  there  is  na 
semantic  Interpretation  to  be  associated  with  the  rule. 
Therefore,  during  language  processing,  recognition  of  the 
associated  syntactic  constructs  will  oai  result  in  a 
transfer  of  control  to  the  semantic  segment  by  LIS  Processor 
Control.  Any  text  other  than  blank,  tab,  new_line,  or 
new_page  characters  Mill,  be  interpreted  as  significant 
semantic  text,  and  will  result  in  the  transfer  of  control 
upon  recognition  of  the  associated  constructs. 

The  following  transformations  on  the  LIS  Language 
Definition  are  performed  automatically  by  LIS,  and 
establish  the  necessary  semantic  linkage  with  the  syntax 
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specification* 

a.  The  following  text  is  inserted  between  the 
user  defined  initialization  semantics  and  the 
first  BNF  rule* 

bnf_rule_semantics  *  entry( bnf_rule_number, 

alternative_number) I 

del  alternative_number  fixed  binary(35), 
bnf_rule( 1 000)  label, 

bnf_rule_number  fixed  binary(35)i 

go  to  bnf_rule(bnf_rule_number ) l 

alternati ve_number  is  set  during  calls  to  the 
semantic  procedure,  and  indicates  which  of  the 
possible  alternatives  of  a  rule  is  being 
applied  at  the  particular  point  during  the 
parse. 

b.  Each  BNF  rule  is  placed  in  a  PL/I  comment  and 
preceded  by  the  label  *bnf_rule (n)" ,  where  n 
is  the  sequence  number  of  the  rule  in  the 
definition  (n  *  I  for  the  first  rule,  n  *  2 
for  the  second  rule,  etc). 


The  way  in  which  linkage  is  effected  to  the  BNF  rule 

semantics  may  now  be  explained* 

Assume  an  LIS  produced  processor  is  processing  text 
written  in  the  language  specified  by  the  definition  in 
segment  abc.lis.  When  a  reduction  is  to  be  made  by 
alternative  q  of  BNF  rule  p,  the  following  call  is 
made* 


call  abc_semantics$bnf_rule_seraantics( p,  q)i 

This  results  in  control  being  passed  to  the  rule 
semantics  associated  with  the  p-th  rule,  via  the 
statement* 

go  to  bnf_rule( bnf_rule_number ) I 

where  bnf_rule_number  has  the  value  p.  It  is  the 
responsibility  of  the  rule  semantics  subsequently  to 
return  control  to  Processor  Control  (via  the  PL/I 
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return  statement). 


a  .  4 . 3  RaIfiX£Pcea-la-JLDBUi-J^^ 


The  language  semantics  gains  access  to  the  input  text 


by  including  the  following  declarations! 


1  text 

2 

2 

top 


reference, 

construct. 

construct 


stack ( 1 00 ) 
start 
,1  ength 
fixed 


al igned 

fixed 

fixed 

binary ( 3b) 


external , 
binary ( 35) , 
binary ( 35) , 
external , 


1  input_text_struct 

based< 

2  input_text 
input_text_struct_ptr 
i nput_t ex t_l ength  fixed 


al igned 

input__text_struct_ptr) , 

char( input_text_length) , 
pointer  external, 
binary(35)  external, 


References  to  the  input  text  are  made  through  the  based 
character  string,  input.text.  These  accesses  are  normally 
coordinated,  however,  by  making  use  of  the  run-time  text 
reference  stack,  text_reference_stack. 


The  general  rules  by  which  access  to  the  input  text  is 
coordinated  by  the  text  reference  stack  may  be  stated  as 
follows* 

a.  Consider  a  particular  alternative  of  a  BNF 
rule  for  which  semantics  is  to  be  specified. 

This  alternative  has  “n*  symbols* 

symbol ( I )  symbol ( 2 ) • • • • symbol ( 1 ) » » • • symbol ( n ) 

b.  During  parsing,  when  a  reduction  is  to  be  made 
according  to  the  alternative  in  question,  the 
top  *nM  entries  on  the  text  reference  stack 
refer  to  the  symbols  of  the  alternatives  as 
follows  * 
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text_reference_stack(top-n+l )  refers  to  symbol(l) 
text_reference_stack(top-n+2)  refers  to  symbol(2) 


text_ref erence_stack( top-n+i )  refers  to  symbol(i) 


text_rererence_stack(top)  refers  to  symbol(n) 

"top"  is  set  by  LIS  Processor  Control  Just 
prior  to  transferring  control  to  the  semantics 
associated  with  the  rule  in  question. 

c.  The  elements  of  text_reference_stack, 
construct_start  and  construct_length,  refer, 
respectively,  to  the  starting  character 
position  and  the  length,  in  characters,  of  the 
first  lexical  construct  recognized  as  part  of 
the  referenced  symbol. 

d.  The  first  lexical  construct  of  the  i-th  symbol 
of  the  alternative  may  then  be  accessed  by  way 
of  the  PL/I  built-in  function,  substr,  as 
follows* 

substr( input_text,  construct„start( top-n+i ) , 
construct_length(top-n+i ) ) 


The  following  two  examples  illustrate  the  way  in  which 
the  text  reference  stack  may  be  used  to  coordinate 
references  to  the  input*  text. 


Examp  lfi-1 

A  stack  entry  associated  with  a  symbol  of  an 
alternative  that  is  a  lexical  construct  refers  to  the 
occurrence  of  that  construct  in  the  input  text.  For 
example,  suppose  that  semantics  is  to  be  specified  for  the 
following  rule  (we  assume  here  that  <integer>  is  a 
<lexical_non_terminal>) * 

<subscript>  **«  (  <integer>  )  ! 
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The  stack  entries  that  are  relevant  for  accessing  the 
symbols  associated  with  this  rule  are  as  follows* 

text_ref erence_stack(  top)  refers  to  *,)M 
textlreference_stack( top-1 )  refers  to  <integer> 
te>.t_reference_stack(  top-2)  refers  to  "(" 

In  recognizing  "(3)1*  as  a  <subscript>  the  application  of 
substr  could  be  made  (using  the  "top-l^th  entry  of 
text_ref erence_stack)  to  gain  access  to  the  character,  "3". 
Of  course,  it  would  not  be  necessary  to  fetch  the  symbols 
•»("  and  ")«  in  this  manner,  since  application  of  the  given 
production  implies  that  stack  entries  top  and  top-2  will,  a 
priori,  refer  to  “)“  and  “(",  respectively. 


EXMLOlfl-2 

The  stack  entry  associated  with  a  symbol  of  an 
alternative  that  is  not  a  lexical  construct  does  not  refer 
to  the  entire  symbol,  only  to  the  first  lexical  construct 
recognized  as  part  of  the  symbol.  Consider,  for  example, 
the  following  two  rules,  with  the  associated  text  references 
as  indicated  (we  assume  here  that  <identifier>  is  a 
<lexical_non_terminal>) * 


Rulfi-1 

<assignment>  **=»  <identifier>  *  <expression>  i  ! 


text_reference_stack( top) 
text_ref erence_stack( top-1 ) 


text_ref erence_stack( top-2 ) 
text_ref erence_stack( top-3) 


refers  to  41 ?•“ 
refers  to  first  lexical 
construct  of 
<expression> 
refers  to  ,,*J, 
refers  to  <identifier> 


Rulfi-2 

<expression>  **=  <expression>  <identifier>  ! 

<identifier>  ! 


Alternative  1 

text_  reference_stack(top) 
text_reference_stack( top-1 ) 
text_reference_stack( top-2) 


refers  to  <identifier> 
refers  to 

refers  to  first  lexical 
construct  of 
<expression> 
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Alt.flrnatlYfi-2  o  .  nHf1ftr> 

text_reference_stack( top)  refers  to  identifier 

Parsing  of  "a  =  b  +  cl"  then  proceeds  as  follows* 

a.  The  identifiers  "b"  would  be  recognized  as 
an  <expression>  (Rule  2,  Alternative  2).  At 
this  point,  text_reference_stack(top)  would 
refer  to  "b"l 

b.  *•♦*•  and  “c"  would  be  recognized,  and  a 
reduction  according  to  Rule  2,  Alternative  1 
would  be  performed .  The  stack  entries  would 
then  be  as  follows* 

text  ref erence_stack( top)  refers  to  Mc" 
textlreference_stack( top-1 )  refers  to 
taxt_reference_stack( top-2)  refers  to  b 

c.  The  identifiers  “a"  and  the  equal  sign, 
having  already  been  recognized,  "l"  would  be 
recognized,  and  a  reduction  according  to  Rule 
1  would  be  performed.  The  stack  entries  would 
then  be  as  follows* 

text_reference_stack( top)  refers  to  "l" 
text  ref erence_stack( top-1 )  refers  to  b 
textlreference_stack( top-2)  refers  to  = 
text_reference_stack( top-3)  refers  to  “a" 

Further  examples  on  the  referencing  of  lexical 
constructs  during  the  LIS  CLR(k)  parse  may  be  found  in 
Section  A. 7. 


The  correspondence  of  the  text  rererence  stack  entries 
to  the  lexical  constructs  is  part  of  the  overall  design  of 
LIS  and  is  therefore  not  easily  modified.  However,  the  way 
in  which  references  are  made  to  the  actual  text  of  the 
constructs  is  quite  flexible  and  involves  only  minor 
modifications  to  LIS  Processor  Control.  In  the  above 
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discussions,  these  references  were  made  by  way  of  the  PL/I 
substr  built-in  function  in  conjunction  with  input_text, 
construct_start,  and  construct_length.  An  alternative 
referencing  strategy  is  to  enter  the  text  of  the  constructs 
into  the  stack  directly,  thus  alleviating  the  rather 
formidable  substr  expressions.  Note  also,  that  this 
strategy  removes  the  requirement  for  maintaining  the  entire 
Input  Text  during  the  parse.  Of  course,  there  are  tradeoffs 
involving  the  increased  space  requirements  for  the  text 
reference  stack  versus  the  convenience  of  direct  reference, 
and  these  tradeoffs  will  have  to  be  evaluated  for  each 
particular  situation. 
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A. 5  LIS  Processor  Control 


LIS  Processor  Control  Is  the  procedure  responsible  for 
coordinating  the  language  processing  activity.  In  parsing 
Input  Text,  it  is  driven  by  the  DPDAs .  With  respect  to 
language  semantics,  its  task  is  to  coordinate  the  parse  with 
the  user  defined  semantics  so  that  the  appropriate  semantics 
may  be  activated  upon  recognition  of  '  e  corresponding 
construct. 

The  configuration  of  LIS  Processor  Control  for  a 
particular  language  will  be  provided  and  maintained  by  the 
LIS  System  Maintainance  Group. 

A. 6  Using  the  Muitlcs  Imnl ementat Ion  of  LIS 

The  following  pages  describe  the  mechanisms  by  which 
the  language  designer/implementer  utilizes  LIS  on  Muitlcs 
for  purposes  of  developing  artificial  languages  and  their 
associated  processors. 


lis 


Command 

In  Directory*  >udd>LIS>Altmanv>LIS_SYSTEM 

Vernon  E.  Altman 


Name*  lis 

The  lis  command  invokes  the  Language  Implementation 
System  to  process  an  LIS  Language  Definition.  The 
functional  output  of  LIS  consists  of  a  set  of  tables  (DPDAs) 
and  a  PL/I  procedure,  which  are  combined  with  LIS  Processor 
Control  to  synthesize  a  processor  for  the  language  being 
defined. 

Processing  a  Language .  Da flalUap 

The  command* 

lis  <path-name>  [<opticns> . . .  1 

invokes  LIS  to  process  the  definition  segment  specified  by 
<path-name>.  A  directory  pathname,  <directory-name>  and  an 
entry  name,  <segment-name>  are  derived  from  <path-name>. 
Due  to  the  Multics  restriction  on  segment  name  lengths,  the 
length  of  <segment-name>  is  limited  to  18  characters. 


QfltlQQS 

The  <option>s  with  which  LIS  may  be  invoiced  are  as 
follows* 

vplJpil)  One  of  these  options  may  be  specified,  with 
the  following  meaning* 

pi  LIS  will  compute  DPDAs  from  the 
crimary  grammar  and  the  lexical 

grammar.  In  addition,  the  Semantic 
Source  Segment  will  be  computed. 

This  is  the  default  option. 

p  LIS  will  compute  the  crimary  grammar 

DPDA.  In  addition,  the  Semantic 

Source  Segment  will  be  computed. 

1  LIS  will  compute  the  lexical  grammar 

DPDA.  In  addition,  the  Semantic 

Source  Segment  will  be  computed. 

If  this  option  is  specified,  LIS  will  print 
out  the  timing  statistics  of  the  major  system 
modules  fit  processing  progresses.  The  timing 
results  are  unlikely  to  be  of  direct  interest 
to  the  general  user.  However,  they  do  give  an 
accurate  indication  of  the  of  processing 
sequence  and  of  the  total  processing  cost. 

The  default  for  this  option  is  oat  to  display 
timing  statistics. 

If  this  option  is  specified,  LIS  will  compute 
the  semantic  Source  Segment  only.  This 
option  overrides  any  of  the  options  (pl«pil>. 

The  default  for  this  option  is  oil. 


If  LIS  is  invoked  with  no  arguments  (i.e.,  simply 
•l  is*)*  then  it  will  display  on  the  user  input/output  stream 
(normally  the  terminal)  the  form  in  which  it  expects  its 
arguments. 


sem 
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Qaaerated  Segments 


LIS  generates  the  following  segments 
working  directory,  depending  on  the  options 
the  definition  as  indicated. 


in  the  user's 
and  validity  of 


<segment-name>.  lis_exec 

This  segment  contains  printable  information  on 
the  processing  of  the  definition  that  should 
prove  to  be  of  value  to  the  user  in  developing 
his  language  and  its  processor.  In  the  current 
implementation  of  LIS,  this  segment  receives  the 
non-terminal  cross-references  (definitions  and 
references)  and  a  listing  of  the  key-symbols  of 
the  language  being  processed.  This  segment  is 
generated  in  all  activations  of  LIS  except  those 
in  which  the  input  segment  is  syntactically 
invalid  (e.g.  LIS  is  activated  to  process  a  PL/I 
object  segment),  or  in  which  the  “sent*  option  is 
spec i f ied. 


<segment-name>_semantics .pi  1 

This  is  the  Semantic  Source  Segment  that  LIS 
computes  from  the  LIS  Language  Definition.  This 
segment  is  always  computed  in  those  activations 
of  LIS  in  which  an  LIS  Language  Definition 
segment  is  specified. 


<segment-name> .dpda 

This  segment  contains  the  DPDAs  (parsing  tables) 
that  are  computed  from  the  LIS  Language 
Definition,  and  that  are  used  to  drive  LIS 
Processor  Control  in  the  the  parse  of  Input 
Text  written  in  the  defined  language.  This 
segment  is  computed  if  and  only  if  the  submitted 
grammar  is  found  to  be  acceptable,  both  to  the 
LIS  Pre-Processor  and  to  the  LIS  CLR(k) 
Generator,  and  if  the  "sem"  option  is  not 
spec  if ied. 
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<segment-name>  .  lis_dpda 

This  segment  is  simply  a  printable  version  of 
<segment-name> .dpda. 

Erocessor  Control 

The  configuration  of  LIS  Processor  Control  for  a 
particular  language  will  be  provided  and  maintained  by  the 
LIS  System  Maintenance  Group. 
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A.  7  Example  of  an  LIS  LaQflUflflfl  H»f1n1Unn 


In  this  section,  we  present  an  example  of  the 
utilization  of  LIS  in  the  development  of  an  artificial 
language  and  its  associated  processor.  The  language  defines 
a  simple  arithmetic  assignment  statement,  called  assign. 
The  syntax  of  assign  is  as  follows* 


Ihe__ Lexical  Grammar 

< 1  ex ical_non_term inal >  **  = 

<identifier> 

<integer>  ! 

<identifier>  **= 

<identif ier>  a— >z  J 
a->z  ! 

<integer>  **  = 

<integer>  0->9  J 
0->9  l 

<non_lexical>  11  = 

'040  i  'Oil  I  '012  J  '014  1 


<pr imary_non_terminal >  **  = 

<assignment_statement>  ! 

<assignment_statement>  **« 

<identifier>  *  <expression>  ? 


<expression>  **  = 

<expression>  +  <term>  ! 
<expression>  -  <term>  ! 
<term>  ! 


<term>  *  *  = 

<term>  *  <factor>  ! 
<term>  /  <factor>  ! 
<factor>  ! 
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<factor>  **  = 

v  <expression>  )  ! 

<ldentifier>  ! 

<integer>  ! 

The  processing  that  we  perform  on  text  of  the  assign 
languags  is  a  simple  translation  into  a  symbolic 
intermediate  language,  the  type  that  may  be  produced  by  the 
interpretation  phase  of  a  typical  compiler,  for  example. 
Admittedly,  the  example  is  far  too  simple  to  be  of  any 
practical  valuel  its  purpose  is  simply  to  illustrate  how  the 
syntax  and  cemcn^ics  of  artificial  languages  may  be 
structured  for  processing  by  LIS.  As  such,  the  example 
draws  on  many  of  the  language  j&sign  and  processor 
implementaion  issues  previously  discussed. 

Processor  Control  for  our  translator  is  an  J,off  the 
shelf"  version,  and  is  named  assign,  assign  takes  its  input 
from  segments  with  the  suffix,  ".assign". 

On  the  following  pages,  we  have  included  those 
documents  associated  with  assign  that  are  particularly 
relevant  to  the  LIS  user  during  development  of  the 
language.  The  documents  are* 

!.  The  LIS  Language  Definition. 

2.  The  Semantic  Source  Segment. 

3.  l.'.e  LIS  Execution  Output  Segment. 

4.  Sample  Translations. 
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The  semantics  of  the  language  is  reasonably 

straightforward  and  self-documenting.  Our  discussion  of  the 
Language  Definition  will  therefore  be  limited  to  the 

following  description  of  the  output  language  primitives* 

add  operand.!,  operand.2,  operand.3* 

operand.!  +  sperand.2  —>  operand.3 

sub  operand.!,  operand.2,  operand.3* 

operand.!  -  operand_2  ->  operand.3 

mult  operand.!,  operand.2,  operand.3* 

operand.!  *  operand.2  — >  operand.3 

div  operand.!,  operand.2,  operand.3* 

operand.!  /  operand.2  ->  operand.3 

assign  operand.! «  operand.2* 

operand.!  ->  operand.2 

The  translation  makes  use  of  temporary  variables,  which 
are  indicated  in  the  output  as*  T<integer>. 


} 
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assign 

assign*  arguments  missing. 

Arguments* 

1 i  <assign_input_text_segment_name> 

2-3*  <options> 

jjpu  Print  parse  of  assign  program. 

jins«  Do  not  perform  any  translation 

semantics. 


print  al .assign  1 
a  =  b  ♦  cl 


assign  al  p 

add  bf  c,  T1 
assign  T1 ,  a 

Assignment  Translation  Completed 


The  LIS  CLR(fc)  Parse  of  al .assign* 

Line  Af.tlQQ 

!  read  a 

1  read  * 

1  read  b 

1  apply  (9*  2) 

1  apply  (8*  3) 

1  see  ♦ 

1  apply  (7*  3) 

1  read  + 

1  read  c 

1  apply  (9*  2) 

1  apply  (8*  3) 

I  see  l 

1  apply  (7*  1 ) 

1  read  I 

1  apply  <6*  1) 

l  apply  (5*  1 ) 

Parse  Completed 


-  185  - 


i 


AA-u^AAAAAAAACjCoCofvjivJivjrNJNjfvjrNjrurNj 


print  a2. assign  I 
left  = 

operand.)  ♦  operand_2 

12345  *  (operand_3  +  6*operand_4)  /  operand_5 
operand_6  I 


assign  a2  p 

add  operand. I,  operand.  2,  TJ 
mult  6,  operand_4,  T2 
add  operand_3,  T2,  T3 
mult  12345,  T3,  T4 
div  T4,  operand_5,  T5 
sub  TJ,  T5,  T6 
add  T6,  operand_6,  T7 
assign  T7,  left 

Assignment  Translation  Completed 


The  LIS  CLR(k)  Parse  of  a2. assign* 

Lin£  Action 

I  read  left 

read  = 

read  operand..) 
apply  (9*  2) 
apply  (8*  3) 
see  + 

apply  ( 7 i  3) 
read  + 

read  operand.2 
apply  (9*  2) 
apply  (8*  3) 
s  ee  - 

apply  (7*  I ) 
read  - 
read  12345 
apply  ( 9 »  3) 
apply  ( 8 *  3) 
see  ★ 
read  * 
read  ( 

read  operano.3 
apply  (9*  2) 
apply  (8*  3) 
see  + 

apply  ( 7 »  3) 
read  ♦ 
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read  6 
apply  (9*  3) 
apply  (8*  3) 
see  * 
read  * 

read  operand_4 
apply  (9*  2) 
apply  (8*  1 ) 
see  ) 

apply  (7*  1 ) 
read  ) 

apply  ( 9»  I ) 
apply  (8*  1 ) 
see  / 
read  / 

read  operand_5 
apply  (9*  2) 
apply  (8*  2) 
see  + 

apply  <7*  2) 
read  * 

read  operand_6 
apply  (9*  2) 
apply  (8*  3) 
see  * 

apply  (7.  . ) 
read  I 

apply  (6*  1 ) 
apply  (5*  1 ) 


Parse  Completed 
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print  a3. assign  I 
left  = 

operand_l  +  oper<and_2 

12345  *  operand_3  &  operand_4  ♦ 
operand_5  I 


assign  a3  p 

add  operand_l,  operand_2,  Tl 
mult  12345,  operand_3,  T2 
sub  TJ,  T 2,  T3 

Syntax  error  on  line  4,  readingi  **&  op...11. 
Translation  terminated. 


The  LIS  CLR(k)  Parse  of  a3. assign* 


LiQ£ 

I 

1 

2 
2 
2 
2 
2 
2 
2 
2 
2 
3 
3 

3 

4 
4 
4 
4 
4 
4 
4 
4 
4 
4 


Action 
read  left 
read  = 

read  operand_l 
apply  (9*  2) 
apply  (8*  3) 
see  ♦ 

a  pp  1  y  (  7  *  3 ) 
read  ♦ 

read  operand_2 
apply  (9*  2) 
apply  (8*  3) 
see  - 

apply  (7*  I  ) 
read  - 
read  12345 
apply  (9*  3) 
apply  (8*  3) 
see  ★ 
read  * 

read  operand_3 
apply  (9*  2) 
apply  (8*  I) 
see  & 

apply  (7*  2) 


Parse  Terminated 
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Appendix  B 

LLS-^-A-Macra-Sy&tan  Description 

B.  1  IniXQdUCtlQD 

In  this  appendix,  we  present  a  macro  description  of  the 
Language  Implementation  System.  The  description  includes 
flowcharts  of  the  major  system  procedures  and  descriptions 
of  the  major  system  data  structures.  This  macro 

description,  in  conjunction  with  the  algorithmic  description 
in  Chapter  II,  should  provide  enough  information  on  the 
system  design  so  that  "persons  having  ordinary  skill  in  the 
crt  to  which  said  subject  matter  pertains"  may  reproduce  the 
implementation  of  LIS. 
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B . 2  Major  System  Procsdurs^ 

In  the  following  discussion,  we  present  the  design  of 
the  major  procedures  of  the  Language  implementation  System. 
In  addition,  we  Indicate  the  size  of  each  procedure  by 
giving  the  number  of  PL/I  statements  (including  comments) 
comprising  the  procedure,  and  by  giving  the  procedure's 
object  segment  size  (36  bits/word). 


B.2.1  llS. 


rfhen  activated  at  its  primary  entry  point,  lis,  or  at 
the  secondary  entrv  point,  debug_monitor_entry,  the  lis 
procedure  controls  the  processing  of  an  LIS  Language 
Definition  by  the  Language  Implementation  System.  In  such 
activations,  its  function  is  essentially  that  of  a 
computational  dispatcher  in  the  sense  that  its  own 
capabilities  are  restricted  to  the  controlled  activation  of 
the  major  system  modules  of  Processor  Generation. 


The  entry  points  of  the  lis  procedure  are  as  follows* 
lis 

This  is  the  primary  entrv  point  of  lis  and  is 
invoiced  by  the  language  des igner/ implementer 
for  purposes  of  processing  an  LIS  Language 
Definition. 

debug_mon i tor_en  t rv 

This  is  a  secondary  entry  point  of  lis  and  is 
invoiced  by  the  1  is_debug_monitor  procedure 
when  modifying  and  debugging  LIS. 

report_time 

This  is  a  secondary  entry  point  of  lis  and  is 
invoiced  at  various  points  within  LIS  for 
purposes  of  system  timing.  Since  it  does  not 
contribute  to  the  functional  capability  of  the 
system,  there  are  no  further  references  to 
report_time  within  the  subsequent  system 
description. 


Procedure  Size  * 

So  rce*  168  PL/I  Statements 
Object*  1137  Words 


I  9 1 


report_time 


d  ebug_moni t  orient  ry 


Call  operating  3ystem 
timing  procedure  and 
determine  CPU  time  elapsed 
for  procedure  being  timed. 


LIS  activated  for 
production  (non 
debugging)  purposea. 


LIS  activated  by 
lla_debug_monitor  for 
debugging  purpoaea. 


Issue  timing  meaaage. 


call 

va l 1 oat e_da t i ni l ion 


/  valld^v, 
definition 


call 

enalyaegraa 


1 1 ■  laalcal 


able  ora— no 


Mlcal\ 
gnlter  to 
ganar- 
jtadX 


/primary  \ 
recognizer  to 

^ssrx 


Call  lnltlsllze-generator  to 
Initialize  proceasor  gener¬ 
ator  from  primary  grammar* 


Call  lnltlallze_generator  to 
Initialize  prooeaaor  generator 
from  lexical  grammar. 


Call  comput*__cf am  to  compute 
CFSM  from  primary  grammar. 


Call  compute_cf am  to  com*  ate 
CFSM  from  lexical  graimar. 


Call  oonvert_cf am_to_dpda 
to  convert  primary  CFSM  to 
primary  DPDA. 


Call  convert_cf am_t  _dpda  to 
convert  lexical  CFSi~to 
lexical  DP DA. 


^Uruaaolve 

inadequate 


lie-lexical 


/WMOl  ved 
^inadequate 

at  a  tea 


Call  optlmlze_dpda  to 
optimize  contenta  of  primary 


Call  optlmlze_dpda  to 
optimize  contenta  of  lexical 
DP  DA. 


Call  multlcs_dpc. '  optimization 
to  optimize  representation  of 
primary  DPDA  for  .lultics 
environment. 


Call  multlcs_dpds_optlmisatlon 
to  optimize  representation  of 
lexical  DPDA  for  Multlcs 
environment. 


lls-lexlcal 
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return 


B .2.2  validate  definition 


The  val idate_def inition  procedure  performs 

following  basic  functions* 

a.  The  procedure  processes  the  arguments  with 
which  LIS  was  activated. 

b.  The  procedure  processes  the  submitted 

grammar  by  validating  its  syntax  and 
entering  the  grammar  into  the  grammar 

structures  (Section  B.3.1). 

c.  The  procedure  computes  the  Semantic  Source 
Segment. 


In  performing  the  above  functions,  val idate_def inition 
makes  use  of  the  following  internal  procedures* 
move_rule 

This  procedure  is  invoked  for  each  BNF  rule  of 
the  LIS  Language  Definition,  and  for  the  i-th 
such  rule  it  appends  to  the  Semantic  Source 
Segment  the  PL/I  label  variable,  bnf_rule(i), 
and  the  I-th  rule  enclosed  In  PL/I  comment 
delimiters  (e.g.  /*  i-th  rule  ★/> .  The  text 
index  (into  the  LIS  Language  Definition 
segment.)  Is  left  pointing  at  the  character 
Just  beyond  the  end  of  the  i-th  BNF  rule. 

fetch„non_terminal 

When  this  procedure  is  Invoked,  the  text  index 
is  pointing  at  a  character  that  is  suspected 
to  be  the  first  character  of  a  non-terminal  of 
the  grammar  (e.g.  "<“)•  fetch_ non— terminal 
then  determines  if,  in  fact,  a  non-terminal 
exists  which  starts  at  that  point.  If  so,  the 
non-terminal  is  entered  into  the  non-terminal 
structure  (this  structure  contains  an  entry 
for  each  unique  non-terminal  in  the 
grammar  —  see  Section  B.3.2),  the  index  of  its 
entry  In  the  structure  is  returned,  and  the 
text  index  is  advanced  to  the  end  of  the 
non-terminal  (e.g.  *>*) .  If  non-terminal  i-s 

not  detected,  then  an  entry  index  of  zero  is 
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returned  and  the  text  index  is  not  modified. 

fetch_terminal_str ing  ,  ,  v 

When  this  procedure  is  invoked,  the  text  index 

is  pointing  at  a  character  that  is  known  to  be 
the  start  of  a  terminal  string  of  the  grammar. 
fetch_terminal_str ing  then  determines  the 
length  of  the  terminal  string  by  scanning  for 
the  first  non-escaped  blank,  tab,  new  line,  or 
new  page  character.  The  input  text  index  is 
advanced  to  the  end  of  the  terminal  string  and 
the  starting  index  and  length  of  the  terminal 
string  are  passed  to  the  calling  procedure. 

aet  bnf  rule  bounds 

This”  procedure  scans  the  LIS  Language 
Definition  segment  from  the  current  text 
position,  looking  for  the  next  BNF  rule  of  the 
grammar.  If  a  rule  is  found,  the  location  of 
its  starting  position  and  ending  position  are 
passed  to  the  calling  procedure.  If  no  rule 
is  found,  values  of  zero  are  returned  for  the 
starting  and  ending  positions.  The  text  Index 
Is  not  modified  by  this  procedure. 

get  non_space 

This  procedure  advances  the  text  index  from 
its  current  position  to  a  position  at  which 
it  points  at  a  character  that  is  neither  a 
blank,  tab,  new  line,  or  new  page  character. 

move_rule_semantics  .  .  T<- 

This  procedure  moves  the  text  or  tne  i-io 
language  Definition  segment  between  the  end  or 
the  current  BNF  rule  and  the  start  of  the  next 
BNF  rule  (or  end  of  segment,  if  the  current 
BNF  rule  is  the  last  rule  of  the  Definition) 
into  the  Semantic  Source  Segment  as  the 
semantics  associated  with  the  current  rule. 
If  the  only  characters  of  such  text  are  blank, 
tab,  new  line,  and  new  page  characters,  then 
there  is  assumed  to  be  no  meaningful  semantics 
associated  with  the  rule  and  the  semantics  bit 
of  the  rule  Is  set  to  *0*b.  Otherwise  the 
semantics  bit  of  the  rule  Is  set  to  «l«b.  The 
text  index  is  sec  to  the  start  of  the  next 
BNF  rule  or  end  of  segment,  as  appropriate. 
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valldate-dcf lnlti on 


val ldate_def lnltion_A 


validate  definition  A 


next_sytnbol 
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validate  definition  (continued) 


next_symbol 


Initialize  grammar_aymbola 
for  next  symbol  of  current 
alternative.  Poaition 
Definition  segment  text  index 
at  atart  of  symbol. 


(X^at  end 
of  current 
*ssaiterne> 


Petch  the  aymboi  via  call  to 
f  etch_non_termi  na  1  or 
fatch~terminal_atring,  as 
appropriate. 


validate  definition  B 


Enter  ayrabol  into 
gr  anna  r_  a  ymbo  1  a , 


If  no  aymbola 

for 

alternative x 

emit  error 

diagnostic,  definition  ia 

Invalid. 

If  no  alternativea  for  BNP 
rule i  emit  error  diagnostic 
definition  is  invalid. 


Call  move_ rule  to 
place  BNP  rule  in  Semantic 
Source  Segment  as  a  PL/ 1 
comment.  Insert  PL/I  label 
variable  Juat  prior  to 
comment  ao  aa  to  effect 
rule  semantic  linkage. 


validate  definition  B 


B  .2.3 


analyse  grammar 


This  procedure  analyzes  the  submitted  grammar  to  verify 
that  certain  conventions  and  linguistic  structural 
requirements  ere  satisfied.  The  implementation  of  the 
analysis  procedures  is  indicated  in  the  following 
flowcharts.  A  user  oriented  discussion  of  the  conditions  to 
be  satisfied  is  given  in  Appendix  A. 


ana lyze_grammar  makes  use  of  the  following  internal 
procedure  * 

ca lculate_non_terminal_cross_ref s 

This  procedure  determines,  for  each 
non-terminal  in  the  grammar,  the  BNF  rules  in 
which  the  non-terminal  is  defined  and  the 
alternatives  in  which  the  non-terminal  is 
referenced.  This  cross  reference  information 
is  entered  into  the  non-terminal  structures 
(Section  B.2)  and  is  also  written  into  the 
execution  segment  (“ . lis^exec" )  in  printable 
text  form. 


Procedure  Size* 

Source*  954  PL/I  Statements 
Object*  7319  Words 
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BnBlyze_qrarcmar 


ens  lyre__grarr»nar_A 


8  na  1  y  ze_gr  at  «na  r_  A 


•nelyze_grernmar_B 
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analyr ^_qraitmar  (continued) 


analyze_grarrmar_B 

Productions  defining  or^  referencing^ 
non- terminals  not  occurring  In  any 
derivation  of  text  of  the  language 
(termlnel  sentential  form)  do  not 
contribute  to  the  grammar,  and  hnnce 
may  be  deleted.  Detecting  such 
non- terminals  ie  accomplished  by 
tracing  through  the  grammar  in  a 
bottom  up  fashion.  (Oin  66) 

r  ; 

"Un- needed"  productions  dever’  ed  by 
the  above  techniques  do  not  halt 
processing  of  the  language  uofinl- 
tion.  At  this  point  analyz'*_grara»%i 
emits  diagnostics  identifying  the 
"un- needed"  productions  and  the 
reaeons  why  they  did  not  contribute 
to  the  grammar. 


♦ 


Because  of  th.  way  In  which  we 
store  th.  configuration  Infor¬ 
mation  (aaa  lnltlallze_generator) , 
we  place  th.  restriction  that, 
with  tha  exception  of  those  non¬ 
terminals  defined  to  be  lexical 
non-tarmlnala,  no  non-terminal  may 
be  explicitly  referenced  by  both 
the  primary  and  the 
lexical  gramnars.  For  each  grassier, 
we  determine  which  non-terminals 
are  referenced  by  the  previous 
technique,  and  laaue  diagnoatica 
In  thoaa  cases  In  which  conflicts 
occur.  Such  conflicts  make  the 
grauvnar  unacceptable. 


anelyze_grammar_C 


analyze_granmar_C 


If  LIS  wca  activated  by  the 
lia_debuo  monitor,  and  if  the  ambiguity 
check  option  was  specified,  anelyza_gramma 
now  checks  the  submitted  graimar  for  Simul¬ 
taneous  left  and  right  recursion,  e  common 
type  of  syntactic  ambiguity.  The  technique 
ie  to  fcrm  a  bit  iMtrix  consist  lng  of  the 
production  hesds  versus  the  goal  -vmbrO  for 
each  production  of  tha  gresvaar,  end  another 
matrix  oonelating  of  the  production  telle 
versus  the  goal  symbol  for  each  production 
of  the  grammar.  The  transitive  closure  la 
then  taken  on  each  matrix,  and  simultaneous 
left  and  right  recursion  axiata  whenever  e 
bit  la  "on"  in  the  bit  vector  formed  by 
taking  the  boolean  product  of  the  diagonals 
(top  left  to  bottom  right)  of  the  two 
resulting  metrloee.  Tha  technique  la 
extremely  time  consuming  and  thus  la 
accessible  only  through  lie  debug_monltor. 
Since  tha  UUk)  technique  will  not  generate 
e  recognizor  for  en  ambiguoua  grassier 
anyway,  it  turns  out  In  practice  to  be  more 
efficient  tu  let  the  generation  techn„ iue 
detect  end  report  syntactic  ambiguity 
(In  addition  to  simultaneous  left  and 
right  recut  jlon) .  In  ambiguous  gram'  : 
ie  unacceptable. 


If  lie_dabuq_nonitor  was  used  to  activate 
LIS,  and  If  it  wea  requested  that  the 
grammar  structures  be  displayed,  than  a 
call  la  made  to 

lie_d*bu  g_monl  tor  5print_grajwwr_etruct  urea 
to  print  the  structures. 


^  return  ^ 
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When  activated  at  its  primary  entry  point, 
initialiie_gener«tor  performs  the  initialization  necessary 
to  compute  the  CLR(k)  parser  of  the  submitted  grammar 
(primary  or  lexical).  The  initialization  performed  is  as 
fo llows* 

a)  Various  temporary  files  are  created  and 

opened  (Section  B.3). 

b)  The  basic  configuration  information  is 

computed  and  entered  into  the  configuration 
information  structures  (Section  B.3. 4). 

c)  The  apply  configuration  information  is 

computed  and  entered  into  tha  configuration 
information  structures. 


The  secondary  entry  points  of  initial ize_generator  are 
as  follows* 

enter..  key_symbol  s 

This  entry  is  used  to  create  the  key-symbol 
structures  (Section  B.3. 3).  When  called  with 
a  key-symbol  of  the  primary  grammar,  the  entry 
scans  the  key-symbol  structures  to  determine 
if  the  symbol  has  already  been  entered.  If 
so,  the  index  of  the  symbol  within  the 
structures  is  returned.  If  the  key-symbol  was 
not  found  in  the  structures,  then  it  is 
appended  to  the  structures,  and  the  index  that 
is  returned  is  therefore  the  number  of  symbols 
in  the  structures. 

fetch_effective_terminal_string 

This  entry  accepts  a  string  of  terminal 
characters  from  the  LIS  Language  Definition 
segment  (the  characters  constituting  either  a 
non-terminal  or  a  terminal  symbol  of  the 
grammar)  and  converts  it  to  a  canonical  form 
by  resolving  all  escaped  characters  in  the 
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string 


Procedure  Size* 

Source*  502  PL/I  Statements 
Object*  4447  Words 
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lnltiallze_generator  continued) 


lnitiallze_generator  B 


do  for  each  rule^  In  qranmmr 


for  each  alternative  of  rule 


Determine  number  of  stack  entrlae  that  will  ha 
popped  when  this  production  la  applied.  If 
grammar  is  primary,  number  equals  number  of  symbol! 
in  alternative.  If  grammar  la  lexical,  mmfcer^ 
equals  sum  of  number  of  non-terminals  In  altsr- 
riurab®r  of  characters  In  affective 
(with  escapes"  resolved)  terminal  strings, 
counting  special  lexical  encoding  at  single 
character. 


type  primary 


totar  non- terminals  defined  to  be 
< laxlcal_non_t ermlnal >  Into  key  symbols 
table. 


Print  key-symbols  Into  execution  segment  and 
cloaa  segment , 


If  lls_debug_monltor  was  used  to  activate  LIS 
and  If  it  was  n quested  that  the  configuration 
Information  be  displayed,  then  call 
Ue_debug_nonltor|prlnt_conflguratlon  Information 
to  print  the  Information. 


B  .  2  .  b  ££QU211t£_£X£m 


This  procedure  accepts  the  grammar  structures  (Section 
B.2.1),  the  non-terminal  "tructures  (Section  B.3.2),  the 
key-symbol  structures  (Section  B.3.3),  and  the  configuration 
information  structures  (Section  B.3.4),  and  computes  the 
Characteristic  Finite  State  Machine  (CFSM)  from  the 
specified  grammar  (primary  or  lexical).  The  algorithm  is 
that  of  Knuth-Eerly  (Ear  70),  modified  so  as  to  compute 
either  lexical  or  primary  parsers  by  the  method  of  iterative 
computation  of  configurations  The  representation  and 
interpretation  of  the  configurations  is  discussed  in 
Sections  B.3.4  and  B.3.5.  (See  Chapter  III,  Section  III.C). 

compute_cfsm  makes  use  of  the  following  internal 
procedure* 

complete_conf igurat ion 

This  procedure  is  invoked  to  complete 
configuration!  current_conf iguration  +  1). 


Procedure  Size* 

Source*  512  PL/I  Statements 
Object*  2873  Words 
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ill**  ct  sir  (continued) 


B  •  2  •  6 


CQOttfirl..clsm  tfl  dpda 


This  procedure  converts  the  CFSM  computed  by 
compute_cfsm  into  a  Deterministic  Push  Down  Automaton  (DPDA) 
with  look-ahead,  and  enters  the  DPDA  into  initial_dpda 
(Section  B.3.10).  First  the  procedure  converts  each  CFSM 
state  with  read  transitions  into  a  separate  DPDA  read  state. 
In  so  doing,  all  read  transitions  on  non-terminals  are 
deleted,  except  that  when  converting  a  primary  CFSM,  those 
non-terminals  that  are  <lexical.non_terminal>s  are  retained. 

As  a  first  step  in  the  conversion  of  the  apply  states, 
the  internal  procedure,  compute_look_back,  is  invoked  to 
determine,  for  each  CFSM  state  containing  at  least  one  apply 
transition,  the  set  of  look-back  states  that  is  appropriate 
to  the  production(s)  being  applied  in  the  state  (see 
discussion  of  look-back  structures.  Section  B.3.9). 
Following  the  computation  of  the  look-back  sets,  each  apply 
transition  of  the  CFSM  is  converted  into  a  separate  DPDA 
apply  state.  In  converting  each  transition,  a  search  is 
made  of  the  look-back  set  associated  with  the  CFSM  state  to 
whi'rh  the  transition  belongs,  and  an  entry  is  made  in  the 
corresponding  DPDA  apply  state's  list  of  top  states  for  each 
such  look-back  state  of  the  set  from  which  the  production 
being  applied  may  have  originated. 
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For  each  inadequate  state  of  the  CFSM, 
converts fsm_to_dpda  attempts  to  resolve  the  inadequacy  by 
converting  the  state  into  a  DPDA  look-ahead  state. 
Resolution  is  attempted  by  looking  ahead  a  maximum  of  three 
symbols.  The  external  procedure,  look_ahead  (see  Section 
B.2.7),  is  invoked  for  purposes  of  determining  look-ahead 
symbols  and  the  internal  procedure, 

intersecting_look_ahead_sets,  is  invoked  to  determine  if  the 
inadequacy  has  been  resolved. 

Procedure  Size* 

Source*  602  PL/I  Statements 
Object*  3325  Words 
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B  .  2 . 7  l2Qj£_2h£ad 


This  procedure  is  invoiced  to  compute  k  symbols  of 
look-ahead  (k  <  3)  for  the  specified  inadequate  CFSM  state. 
In  performing  the  look-ahead,  the  procedure  makes  use  of  the 
following  internal  procedure* 
gene rate_c on tour 

This  procedure  expands  an  initial 
generat ion_contour  and  enters  the  resulting 
read  and  look-ahead  states  into 

1  ook_ahead_(.ontour(  curren  t_contour )  (see 

discussion  of  look-ahead  structures.  Section 
B.2. 1 1 ) . 


Procedure  Size* 

Source*  335  PL/I  Statements 
Object*  2084  rtords 
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B . 2 . 8  optimize  dpda 


This  procedure  optimizes  the  contents  of  the  DPDA 
produced  by  convcrt_cfsm_to_dpda,  and  enters  the  optimized 
DPDA  into  final_dpda  (Section  B.3.12).  Transformations  are 
performed  on  the  DPDA  that  remove  superfluous  and  redundant 
information  so  that  the  resulting  DPDA  is  more  efficient 
than  its  predecessor.  The  optimizations  that  have  been 
implemented  by  no  means  exhaust  the  potential  for  DPDA 
content  optimization.  Other  optimizations,  such  as 
transition  sorting  according  to  empirical  measures  of 
frequency  of  transition  occurrence  for  a  particular 
language,  transition  hashing,  detection  and  deletion  of 
apply  states  that  have  no  associated  semantics  and  that  do 
not  modify  the  DPDA  state  stack,  are  but  a  few  of  the 
optimizations  that  have  not  been  implemented  on  LIS,  but 
that  could  have  significant  impact  on  parser  space  and  time 
efficiency. 

The  representation  of  the  DPDA  read  states  is  such  that 
the  information  regarding  the  states  themselves  is  stored 
separately  from  the  information  on  the  state's  transitions. 
This  being  the  case,  we  can  optimize  the  read  transitions  by 
deleting  duplicate  transition  sequences  that  may  arise  from 
different  read  states. 
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There  are  two  fundamental  optimizations  that  are 
perfjrmed  on  the  DPDA  apply  states.  First,  for  each  apply 
state,  we  determine  the  most  popular  look-back  transition 
destination  state,  and  designate  that  the  default 
destination  state.  Then  all  look-back  transitions  (also 
called  to )  transitions  or  apply  transitions)  of  the  state 
whose  destination  state  is  the  default  destination  state  are 
deleted  from  the  list  of  look-back  transitions.  The  default 
destination  state  is  then  appended  to  the  list,  it  being  the 
convention  that,  during  the  language  recognition  process, 
should  the  top  of  the  state  stack  (after  being  popped)  fail 
to  match  any  of  the  look-back  states  in  the  list  for  the 
current  apply  state,  then  the  transition  to  the  default 
destination  state  is  automatically  taken.  Since  the  LR(k) 
recognition  process  is  deterministic,  we  are  guaranteed  not 
to  introduce  any  errors  into  the  recognition  process  by 
performing  this  optimization. 

The  second  optimization  that  we  apply  to  the  DPDA  apply 
states  is  analog^s  to  the  optimization  applied  to  the  DPDA 
read  states,  and  we  thus  remove  redundant  information. 

The  number  of  optimizations  performed  on  the  DPDA 
look-ahead  states  is  one  or  two,  depending  on  whether  the 
grammar  in  question  is  lexical  or  primary,  respectively.  In 
either  case,  duplicate  look-ahead  transitions  for  a  given 


look-ahead  state  are  deleted.  In  the  case  of  a  parser 
computed  from  a  primary  grammar,  an  additional  optimization 
is  performed  on  the  look-ahead  states  which  is  analogous  to 
the  first  of  the  optimizations  applied  to  the  DP DA  apply 
states.  Thus,  for  each  look-ahead  state,  we  determine  the 
most  popular  lcok-ahead  transition  destination  state,  and 
designate  that  the  default  destination  state.  Any 
look-ahead  transition  whose  destination  state  matches  the 
default  destination  state  is  deleted  from  the  list  of 
look-ahead  transitions  for  the  state  in  question,  and  the 
default  destination  state  is  appended  to  the  end  of  the 
list.  The  recognition  time  interpretation  of  the  default 
look-ahead  transition  is  analogous  to  the  recognition  time 
interpretation  of  the  default  apply  transition.  However,  in 
the  present  case,  the  detection  of  an  erroneous  symbol  in 
the  input  stream  will  be  delayed  until  a  subsequent  read 
state,  whereas  were  the  default  destination  optimization  not 
performed,  such  an  error  would  be  detected  in  the  look-ahead 
state. 


In  addition  to  performing  the  above  optimizations, 
optimize_dpda  moves  the  key-symbol  structures  into  the 
optimized  DPDA  for  those  DPDAs  computed  from  a  primary 
qrammar. 

opt imize_dpda  makes  use  of  the  following  internal 
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procedure* 


pr lnt_dpda 

This  procedure  is  invoked  after  the  DPDA  has 
been  optimized  (and  the  key-svmbol  structures 
moved,  if  appropriate)  and  simply  writes  the 
optimized  DPDA  into  the  J' .lis_dpdau  segment  in 
printable  text  form. 


Procedure  Size* 

Source*  950  PL/I  Statements 
Object*  7584  Words 
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optl.mlzeu.dpda 


Opt im i z e_ Appl y 


Create  and  opan  the  temporary] 
file  that  will  contain  the 
final  DPDA.  In  the  proce¬ 
dure, 

mu  1 1  i  c  a_dpda_  opt  im  1  z  at  1  on , 
thla  DP DA  will  be  further 
optimized  for  space  and  time 
efficiency  on  Multlce. 
However,  the  final  DP  DA  ae 
produced  by  optimize_dpda 
la  suitable  for  parsing. 


Optimise  and  move 
□PDA  Head  States 


do  i  •  1  to 

n  reed  etatee 


Sot  up  final  UVDs  for  new 
reed  etete. 


If  the  treneitiona  out  of 
the  J-th  DPDA  reed  at  ate 
ere  the  eame  aa  those  out 
of  the  i-th  DP  DA  read  state, 
then  sat  the  transition 
index  of  the  J-th  state  to 
the  start  of  the  transitions 
for  the  i-th  state  and 
go  to  read_loop. 


Move  the  DPEA  read  state 
from  the  initial  DPDA  to 
the  final  DPDA. 


read__ loop  — 

u - 1 - -i 


Optlml*._Apply 


Optimize  and  Move 
DPDA  Apply  Statea 


do  for  each  DPDA 
apply  atata 


Sat  up  final  D 
apply  state. 

— 

PDA  for  new 

f— — — ^ — ■ 

■ 

1 


Go  through  the  aet  of  top 
states  that  ia  associated 
with  the  DPDA  apply  state, 
and  determine  that 
destination  state  that  ia 
most  refer  Sliced  in  the  top 
traneitione.  This  deatina- 
tion  state  is  the  default 
destination  state. 


do  for  each  top 
atata  of  DPDA 
•jBplyjt.Jt.  _ 


Move  the  top  transition 
from  the  initial  DPDA  to 
the  final  DPDA. 


Enter  the  default  destina¬ 
tion  a a  the  last  transition 
of  the  DPDA  apply  atata  in 
the  final  DPDA. 


Zf  the  sat  of  top  transi¬ 
tions  Just  computed  already 
exists  in  the  final  DPDA,  do 
not  duplicate  it,  rather 
aet  the  new  DPDA  epply  state 
transition  index  eo  that  it 
references  the  start  of  the 
previously  existing  sat. 


Optimise^  LooX_ Ahead 
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optlmlze_dpda  (continued) 


Opt lml z e_ boo k_ Ahead 


Move_Key_Symbol n 


% 


B .2.9  mu l  tics  doda  _ optimization 

This  procedure  optimizes  the  representation  of  the 
final  DPDA  on  the  Multics  system,  and  in  so  doing,  enhances 
the  space  and  time  efficiency  of  the  parsers  that  execute  on 
Multics.  The  optimizations  performed  are  representative  of 
the  type  of  “fine  tuning^  that  should  be  considered  when  LIS 
is  used  to  produce  parsers  for  a  particular  computing 
environment.  Being  only  representative,  they  by  no  means 
exhaust  the  type  of  optimizations  that  can  be  performed, 
even  on  Multics.  The  possibilities  for  machine  dependent 
DPDA  optimization  are  limited  only  by  one's  imagination  and 
understanding  of  the  space-time  tradeoffs  inherent  in  the 
computing  environment  under  consideration. 

The  optimizations  performed  by  this  procedure  are 

straightforward,  and  will  not  be  considered  in  further 

\ 

detail.  A  comparison  of  the  Final  DPDA  Structures  (Appendix 
B.3.12)  with  the  Multics  DPDA  Structures  (Appendix  B.3.J3) 
should  give  the  reader  an  understanding  of  the  DPDA 
optimizations  that  have  been  Implemented. 

Procedure  Size» 

Source*  650  PL/I  Statements 
Object*  5213  rtords 
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i  » 


B.2.10 


This  procedure  may  be  invoked  instead  of  the  lis 
procedure  in  those  cases  in  which  the  LIS  system  is  being 
modified  and  debugged.  The  procedure  performs  some  simple 
initializations  related  to  its  debugging  options,  and 
immediately  invokes  the  lis  procedure.  Thereafter,  the  LIS 
system  responds  to  the  specified  debugging  requests  by 
invoking  the  appropriate  secondary  entry  points  below. 

pr int_grammar_structures 

This  entry  prints  the  grammar  structures 
(Section  B.3.1)  into  the  " . 1 is_debug"  file. 

print_conf igurat ion_in format  ion 

This  entry  prints  the  configuration 
information  structures  (Section  B.3.4)  into 
the  "  .1 is.debug"  file. 

print_cf sm 

This  entry  prints  the  CFSM  structures  (Section 
B.3.7)  into  the  H.lis_cfsm"  file. 

pr int_state_access 

This  entry  prints  the  state  access  structures 
(Section  B .3.8)  into  the  M . 1 is_debug"  file. 

print_ini t ial_dpda 

This  entry  prints  the  initial  DPDA  strucures 
(Section  B.3.10)  into  the  " . 1 is_init_dpdaM 
file. 


Procedure  Size* 

Source*  622  PL/I  Statements 
Object*  5834  Words 
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B.2.U  lla  processor,  central 


1 is_processor_control  is  the  procedure  that  controls 
the  execution  of  the  language  processors  produced  by  LIS. 

That  is,  it  coordinates  the  lexical  parser,  the  primary 

% 

parser,  and  language  semantics. 

Procedure  Size  (typical) * 

Source*  400  PL/I  Statements 
Object*  1000  Words 
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o.3  Mm  lor  system  Qflifl  Structures 


In  the  following  discussion,  *e  present  a  description 
of  the  major  data  structures  of  the  Language  Implementation 
System.  In  addition,  we  indicate  the  maximum  storage 
requirements  (in  36  bit  words)  of  these  structures  during 
the  computation  of  the  primary  DPDA  of  the  PL/I  grammar 
given  in  Appendix  D. 

The  chart  on  the  following  page  indicates  the  usage  of 
the  major  data  structures  throughout  LIS. 


f 
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B.3.J  grammar  Structures*  grammar_rules 

grammar_alts 
gramnar_symbol s 


The  grammar  structures  exist  so  as  to  provide  a 
convenient  representation  of  the  grammar  submitted  in  the 
LIS  Language  Definition.  They  are  created  by 
valldate_def inltlon,  modified  by  analyze_grammar  and 
initiallze_generator«  and  used  by  compute_cfsm, 
convert_cfsm_to_dpda,  optimize_dpda,  and  lis_.debug_.moni tor . 


grammar  rules  contains  information  on  the  submitted 
grammar  that  is  pertinent  at  the  BNF  rule  level.  Entries 
are  made  i equentlally  according  to  the  order  of  occurrence 
of  the  BNF  rule  in  the  submitted  grammar.  The  grammar 
contains  a  total  of  n_grammar_.ru les  BNF  rules, 
def lned_non_terminal 

The  non-terminal  that  is  defined  in  the 
current  BNF  rule. 

n_alts 

The  number  of  alternatives  in  the  current  BNF 
rule. 

flrst_alt_lndex 

The  index  into  grammar_alts  of  the  first 
alternative  of  the  current  BNF  rule. 

semantics 

A  bit  indicating  whether  semantics  is 
associated  with  the  current  BNF  rule* 

"O^b  ->  no  semantics 
"l^b  ->  semantics 

grammar_type 

A  bit  indicating  the  type  of  grammar  to  which 
the  rule  belongs* 
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“O^b  ->  lexical 
“l^b  ->  primary 


nmmmnr  alts  contains  information  on  each  alternative 
of  the  BNF  rule.  Entries  are  made  sequentially  according  to 
the  order  of  occurrence  of  the  alternative  in  the  submitted 
grammar.  The  grammar  contains  a  total  of  n_grammar_a  Its 
alternatives. 

first_symbol_ index 

The  index  into  grammar_symbols  of  the  first 
symbol  of  the  current  alternative. 

n_.symbols 

The  number  r  *  symbols  in  the  current 
alternative. 

any_non_terminals 

A  bit  indicating  whether  the  current 

alternative  contains  any  non— terminals* 

“0“b  ->  no  non-terminals 

*l«b  ->  at  least  one  non-terminal 

needed_by_language 

A  bit  indicating  whether  the  current 

alternative  contributes  to  the  language 
defined  by  the  submitted  grammar* 

“CPb  ->  not  needed 
J,l"b  ->  needed 
Set  by  analyze  grammar. 

al t„conf iguration_index 

The  index  into  basic_conf igurat ion_info  of  the 
configuration  information  associated  with  the 
start  of  the  currunt  alternative.  Set  by 

initial  ize_genera tor . 


alternatives  of  the 
according  to  the 


describes  the  symbols  that  make  up  the 
grammar.  Entries  are  made  sequentially 
order  of  occurrence  of  the  symbol  in  the 
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submitted  grammar.  The  grammar  contains  n_grammar_symbols 
symbols, 
symbol 

If  the  current  symbol  is  non-terminal,  then 
symbol  contains  the  non-terminal  code.  If  the 
current  symbol  is  a  terminal  string,  then 
symbol  is  an  index  into  the  !.IS  Language 
Definition  segment  of  the  start  of  the 
terminal  string. 

1  ..symbol 

If  the  current  symbol  is  a  non-terminal,  then 
l.symbol  equals  zero.  If  the  current  symbol 
is  a  terminal  string  then  l_syr ool  is  the 
number  of  characters  in  the  LIS  Language 
Definition  segment  that  make  up  the  terminal 
string  (i.e.  escapes  have  not  been  resolved). 


Maximum  storage  requirements  of  structures  during 
computation  of  DPDA  of  PL/I  primary  grammar*  4067  *ords. 
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B.3.2  Non-Terminal  Structurest  non_terminal_struct 

non_terminal_cross_ref s 


The  non-terminal  structures  exist  so  as  to  provide  a 
convenient  representation  of  the  non-terminals  of  the 
submitted  grammar.  non_terminal_struct  is  created  in 
val idate_def inition  and  non_terminal_cross_ref s  is  created 
in  analyze_grammar.  The  structures  are  used  in 
initialize_generator*  compute.cfsm,  convert_cfsm_to_dpda, 
look-ahead,  optimize_dpda,  and  1 is_debug_monitor . 


non  terminal  struct  contains  a  separate  entry  for  each 
unique  (^escapes*  resolved)  non-terminal  of  the  grammar. 
Subsequent  to  the  creation  of  this  structure*  non-terminals 
may  be  referenced  by  the  index  of  their  structure  entry. 
There  are  n_non_terminals  non-terminals  in  the  grammar. 
non_terminal_name 

The  spelling  of  the  non-terminal,  with 
■"escapes*  resolved. 

n_def initions 

The  number  of  BNF  rules  in  which  the  current 
non-terminal  is  defined!  set  by 
analyze_grammar. 

f irst-def inition_index 

The  index  into  non_terminal_cross_ref s  of  the 
first  BNF  rule  in  which  the  current 
non-terminal  is  defined!  set  by 
a  na 1 y z  e_gr amma r . 

n_references 

The  number  of  alternatives  in  which  the 
current  non-terminal  is  referenced!  set  by 
analyze.grammar. 
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f irst_reference_index 

The  index  into  non_terminal_cross_ref s  or  the 
first  alternative  in  which  the  current 
non-terminal  is  referenced;  set  by 
analyze_grammar . 

lexical_non_terrainal 

A  bit  indicating  whether  the  current 
non-terminal  is  a  lexical  non-terminal* 

“0"b  ->  non-terminal  not  <lexical_non_.terminal> 
u 1 h b  „>  non-terminal  is  <lexical_ non_terminal> 


non  terminal  cross  rets  is  a  structure  of  non-terminal 

cross  references.  Entries  are  sequential  in  the  sense  that 

all  definitions  or  references  for  a  particular  non-terminal 

are  contiguous  within  the  structure. 

cr_rule  ,  ,  . 

The  rule  in  which  the  non-terminal  is 

defined/referenced. 
cr_al t 

If  the  table  entry  represents  a  definition, 
then  cr_alt  equals  zero.  If  the  entry 
represents  a  reference  then  cr_alt  equals  the 
alternative  of  cr«rule  in  which  the 
non-terminal  is  referenced. 


./ 

Maximum  storage  requirements  of  structures  during 
computation  of  DPDA  of  PL/I  primary  grammar*  4383  words. 
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ft* 

B.3.3  Kev-SvmbQl  Structures!  key_symbols_struct 

key_symbol  s 

The  key-symbol  structures  exist  so  as  to  provide  a 
convenient  representation  of  the  key-symbols  (terminal 
symbols)  of  a  primary  grammar.  The  structures  sre  created  by 
initialize_generatoi  and  used  by  compute_cfsm, 
optlmize_dpda»  and  1 is_debug_moni tor . 

if  ay  qymhols  sjirur.t  contains  a  separate  entry  for  each 
unique  ("escapes"  resolved)  key-symbol  in  the  primary 
grammar.  Subsequent  to  the  creation  of  this  table* 
key-symbols  may  be  referenced  by  the  index  of  their 
structure  entry.  There  are  n_key_symbols  key-symbols  in  the 
primary  grammar. 
key_start 

An  index  into  key_symbols  indicating  the 
location  of  the  start  of  the  current 
key-symbol . 

key_length 

The  number  of  characters  in  the  current 
key-symbol . 

kev  symbols  is  the  character  string  which  is  a 
concatenation  of  all  unique  key-symbols  in  the  primary 
grammar.  There  are  n_key_chars  characters  in  the  string. 

Maximum  storage  requirements  of  structures  during 
computation  of  DPDA  of  PL/I  primary  grammar*  473  words. 
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During  the  computation  of  the  CFSM,  it  is  necessary  to 
know  the  correspondence  between  bit  positions  in  the 
configurations  and  the  grammar  from  which  the  configurations 
are  derived.  Thus*  one  mapping  exists  from  the  grammar  to 
the  configurations,  and  another  from  the  configurations  to 
the  grammar.  The  first  mapping  is  provided  by 
grammar_alts .al t_conf iguration_index.  The  second  mapping  is 
provided  by  the  configuration  information  structures.  The 
structures  are  created  by  init ialize__generator,  and  used  by 
compute_cfsm  and  lis_debug_monitor. 


contains  information  for  each 


symbol  of  each  alternative  of  the  grammar  for  which  the 


parser  is  to  be  computed.  The  structure  has  one  entry  for 
each  bit  position  in  the  basic  portion  of  the  configuration. 


so  that  the  i-th  entry  corresponds  to  the  i-th  position  in 


the  configuration. 


b._conf  ig_rule 

The  BNF  rule  to  which  the  configuration  bit 
corresponds . 


b_conf ig_a It 

The  alternative  within  b_conf ig_rule  to  which 
the  configuration  bit  corresponds. 


b_c on f i g_ s y mbo 1 

The  symbol  to  which  the  configuration  bit 
corresponds.  If  the  grammar  in  question  is 


-  242 


primary,  then  the  symbol  may  be  the  number  of 
a  non-terminal  or  may  be  the  number  of  a 
key— symbol,  as  indicated  by 

b_conf ig_ symbol_type •  If  the  grammar  in 
question  is  lexical,  then  the  symbol  may  be 
the  number  of  a  non-terminal,  may  be  a 
terminal  character,  or  may  be  the  index  into  a 
temporary  table  in  which  are  stored  the 
special  lexical  encodings,  as  indicated  by 
b_conf ig„symbol_type. 

b.conf ig_symbol_type 

If  the  grammar  in  question  is  primary,  then 

b_conf ig_symbol  is* 

■“OCPb  ->  number  of  key-symbol 
"Ol^b  ->  number  of  non-terminal 
"lCb  ->  not  used 
*  1  l*b  ->  not  used 

If  the  grammar  in  question  is  lexical,  then 

b  conf ig_symbol  is* 

^OCH'b  ->  terminal  character 
■“Ol^b  ->  number  of  non-terminal 
“I0“b  ->  not  used 

*11"b  ->  special  lexical  encoding  index 
b_c on f i g_a t_s t ar t 

A  bit  indicating  whether  the  configuration  bit 
corresponds  to  the  start  of  an  alternative* 

*0*b  ->  not  at  start  of  alternative 
"]*b  ->  at  start  of  alternative 

b_conf ig_at_end 

An  Integer,  the  value  of  which  is  interpreted 
as  follows* 

-»0  The  configuration  bit  corresponds  to 
the  last  symbol  of  an  alternative,  and 
the  value  of  b_conf ig_at_end  is  an 
index  indicating  the  configuration 
position  at  which  the  alternative  is 
applied. 

«0  The  configuration  bit  does  not 
correspond  to  the  last  symbol  of  an 
alternative. 

Uppiv  ronf i juration  info  contains  information  on  the 
application  of  each  alternative  of  the  grammar  for  which  the 
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parser  is  to  be  computed.  The  structure  has  one  entry  for 
each  bit  position  of  the  apply  portion  of  the  configuration, 
so  that  the  i-th  entry  corresponds  to  the  i-th  position  in 
the  con'iguration. 

a“Cjhe  BNF  rule  to  which  the  configuration  bit 
corresponds. 

“  The  ^alternative  within  a_conf ..g_rule  to  which 
the  configuration  bit  corresponds. 

a_CThe 1 number”o f P s ta te s  that  will  be  popped  from 
the  DPDA  state  stack  when  the  alternative  is 
appl led. 

Maximum  storage  requirements  of  structures  during 
computation  of  DPDA  of  PL/I  primary  grammar*  5883  words. 
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b.  3.5  Caaf  lQuration-Simaiugfifi 


as  discussed  in  Chapter  III.  Section  III.C.  the 
configurations  play  an  integral  part  in  the  computation  of  a 
grammar's  CFSM.  state  being  generated  from,  and  thus 
corresponding  to,  a  single  configuration.  The  configuration 
is,  in  effect,  a  computational  notation  for  recording  the 
state  configuration  of  a  grammar  while  computing  its  CFSM. 
The  configurations  consist  of  two  parts,  the  basic 
configuration  and  the  apply  configuration.  The 
interpretation  of  these  is  given  in  the  discussion  of  the 
configuration  information  structures  (Section  B.3.4).  The 
configurations  are  created  by  compute.cfsm  and  are  not  used 
by  the  system  once  that  procedure  has  exited. 

Maximum  storage  requirements  of  structures  during 
computation  of  DPDA  of  PL/I  primary  grammar!  20848  words. 
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B .  3 . 6  ConflauratlQii_AcceS5._Slru&±iir&£  * 

conf igurat ion_access_l is t_s true  t 
conf igurat ion_access_list 


Each  time  a  new  configuration  is  computed,  it  must  be 
determined  whether  the  same  configuration  has  already  been 
computed.  If  so,  references  to  the  new  configuration  may  be 
directed  to  the  previous  configuration,  and  the  new  one 
destroyed.  Since  each  CFSM  state,  and  hence  each 
configuration,  is  accessed  by  one  and  only  one  symbol,  the 
configuration  access  structures  are  threaded  so  that  all 
configurations  accessed  by  the  same  symbol  are  on  the  same 
list.  The  configuration  access  structures  are  created  by 
compute_cfsm  and  used  by  1  is__debug_monitor . 


conf 1  miration  access  list  struci  is  the  structure  that 
bounds  the  threaded  list  (conf igurat ion_access_l ist)  of 
configurations  that  are  accessed  by  the  same  symbol. 

f irst_conf igurat ion_acc ess ed_loc ( i ,  1 ) 

The  location  in  the  list  of  the  first 
configuration  entry  accessed  by  the  i-th 
key-symbol  (if  grammar  is  primary)  or  i-th 
terminal  character,  if  grammar  is  lexical. 

f irst_conf igurat ion_acc ess ed_loc ( i ,  2) 

The  location  in  the  list  of  the  first 
configuration  entry  accessed  by  the  i-th 
non-terminal,  if  the  grammar  is  primary,  or  by 
the  i-th  non-terminal  or  the  i-th  special 
lexical  encoding,  if  the  grammar  is  lexical. 

last_conf iguration_accessed_loc( i,  1 ) 

The  location  in  the  list  of  the  last 
configuration  entry  accessed  by  the  i-th 
key-symbol,  if  the  grammar  is  primary,  or  by 


the  i-th  terminal  character,  if  the  grammar  is 
lexical . 

last_corif  iguration_accessed_loc(  i ,  2  ) 

The  location  in  the  list  of  the  last 
configuration  entry  accessed  by  the  i-th 
non-terminal,  if  the  grammar  is  primary,  or  by 
the  i-th  non-terminal  or  the  i-th  special 
lexical  encoding,  if  the  grammar  is  lexical. 


configuration  access  list  is  the  threaded  list  of 
configuration  indicators. 

conf igurat ion_accessed 

The  number  of  the  accessed  configuration. 

next^conf igurat ion_accessed_loc 

The  location  in  the  list  of  the  next 
configuration  entry!  equals  zero  if  the 
current  entry  is  the  last  in  the  list. 


Maximum  storage  requirements  of  structures  during 
computation  of  DPDA  of  PL/ 1  primary  grammar*  1849  words. 
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B.3.7  CFSM  Structures*  cfsm_states 

cf sm_read_transit ions 
cfsm_apply_transitions 


The  CFSM  structures  comprise  the  Characteristic  Finite 
State  Machine.  The  structures  are  created  by  compute_cfsm, 
and  are  used  by  convert_cfsm_to_dpda,  look_ahead, 
optimize_dpda,  and  lis_debug_monitor. 


cfsm  states  contains  information  that  is  relevant  to 
the  CFSM  at  the  state  level.  As  the  CFSM  is  computed, 

n_cfsm_states  is  advanced  until  a  total  number  of 

n_cf sm_states  +  1  states  exist.  The  O-th  state  exists  for 
purposes  of  initialization  and  look-ahead. 
cfsm_s±ate_type 

A  bit  string  indicating  the  type  of  state* 

•"00"b  ->  read  state 
“Ol^b  ->  apply  state 
"ICPb  ->  inadequate  state 
^IJ^b  ->  not  used 

cf  srrL.accessing_symbol 

The  symbol  with  which  the  state  is  accessed. 

If  the  grammar  Is  primary,  then  the  symbol  may 
be  the  number  of  a  non-terminal,  or  may  be  the 
number  of  a  key-symbol,  as  indicated  by 
cfsm_accessing_symbol_type.  If  the  grammar  is 
lexical,  then  the  symbol  may  be  the  number  of 
a  non-terminal,  may  be  a  terminal  character, 
or  may  be  an  Index  into  a  temporary  table  in 
which  are  stored  the  special  lexical 
encodings,  as  indicated  by 

cfsm_accessing_symbol_type. 

cf sm_accessing_symbol_type 

A  bit  string  Indicating  the  type  of  symbol 
accessing  the  state.  If  the  grammar  is 
primary* 

*00" b  ->  number  of  key-symbol 
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*•01  "b  ->  number  of  non-terminal 
"10"b  ->  not  used 
*  1  l-“b  ->  not  used 
If  the  grammar  is  lexical* 

00“ b  ->  terminal  character 
"OI"b  ->  number  of  non-terminal 
“lO^b  ->  special  lexical  encoding  index 
u i i«b  ->  not  used 

cfsm_corresponding_dpda_state 

The  number  of  the  DPDA  state  which  will  be 
generated  from  the  current  state.  That  is, 
the  number  of  a  DPDA  read  state,  DPDA  apply 
state,  or  DPDA  look-ahead  state. 

cfsm_n_transitions 

The  number  of  transitions  out  of  the  current 
state. 

cf sm_trans itions.ptr 

A  pointer  into  the  state  transitions, 
indicating  the  first  transition  out  of  the 
current  state. 


r f sm  transitions*  c f sm_read_transitions 

c f sm_apply_transi tions 


cfsm_read_transitions  and  cfsm_apply_transitions 
contain,  respectively,  information  on  the  read  transitions 
and  apply  transitions  of  the  CFSM  states.  As  may  be  seen  in 
their  declarations,  however,  they  are  not  independent 
structures,  rather  they  both  overlay  a  structure  of  state 
transitions,  the  point  of  overlay  being  established  by  the 
value  of  cf sm_tr ansi t ion s_ptr  of  th?  state  with  which  the 
current  transitions  are  associated  (n_cfsm_states) •  The 
transitions  from  each  state  are  contiguous  in  the 
transitions  structure. 
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rffim  rf,„rt  t.r wns  1 1 ions  overlays  the  -ead  transitions. 


read  transition_symbol_type 

A  bit  indicating  the  current  transition 
If  the  grammar  is  primary,  then* 
wOO^b  ->  number  of  key-symbol 
*.0j-**b  ->  number  of  non-terminal 
*10“b  ->  not  used 
*  1 lMb  ->  not  used 
If  the  grammar  is  lexical,  then* 

-“OO^b  ->  terminal  character 
jnQl“b  ->  number  of  non-terminal 
*10"b  ->  not  used 
4i||ub  „>  special  lexical  encoding 


type . 


read_transition_symbol 

The  current  transition  symbol.  If  the  grammar 
is  primary,  then  the  symbol  may  be  the  number 
of  a  non-terminal,  or  may  be  the  number  °f  a 
key— symbol,  as  indicated  by 

read  transit  ion__symbol_type .  If  the  grammar 
is  lexical,  then  the  symbol  may  be  the  number 
of  a  non-terminal,  may  be  a  terminal 
character,  or  may  be  an  index  into  a  temporary 
table  in  which  are  stored  the  special  lexical 
encodings,  as  indicated  by 

read_transition„symbol_type . 
read_transition_dest ination 

The  CFSM  state  that  is  the  destination  of  the 
current  state. 


read_transition_dpda„state  . 

The  DPDA  state  in  which  the  current  transition 
will  be  found.  If  the  current  read  transition 
is  from  a  CFSM  read  state,  then 
read__transit  ion__dpda_state  will  be  the  number 
of  the  corresponding  DPDA  read  state. 
However,  if  the  current  transition  is  from  a 
CFSM  inadequate  state,  then  that  state  win 
generate  a  corresponding  DPDA  look-ahead 
state,  and  all  read  transitions  out  o  the 
CFSM  state  will  generate  a  single  DPDA  read 
state.  In  this  latter  case  the  value  of 
read  transit ion__dpda_state  will  correspond  to 
the  number  of  this  generated  read  state. 


read_t  r  an  sit  lon_.no  t_us  ed 
not  used. 
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rf«m  ftonl v  transitions  overlays  the  apply 
transit  ions. 


apply_trans it ion_type 
Equals  "01ub,  by  defl 

apply_transltlon_rule 

The  number  of  the 
applied. 

apply_translt  lon_alt 
The  number  of 

apply„transitlon_rul« 


ltlon. 

BNF  rule  that  Is  to  be 

the  alternative  of 
that  is  to  be  applied. 


apply„transition_dpda_state  .  ,n.Hnn 

The  DPDA  state  in  which  the  current  transition 
will  be  found.  If  the  transition  Is  .fro?  Jn 
apply  state,  then  apply_transltlon_dpd?^state 
will  be  the  number  of  the  corresponding  DPDA 
apply  state.  However,  if  the  current 
transition  Is  from  an  Inadequate  CFSM  state, 
then  that  state  will  generate  a  corresponding 
DPDA  look-ahead  state,  and  each  app  y 
transition  out  of  the  CFSM  state  *111  fltnar.t. 
a  separate  DPDA  apply  state.  In  such  a  case, 
the  value  of 

correspond  to  the  number  of  this  generated 
apply  state. 


apply_transltlon_n_to_pop  .  _  M 

The  number  of  states  that  will  be  popped  from 
the  DPDA  state  stack  when  the  alternative  is 

applied. 


Maximum  storage  requirements  of  structures  during 
computation  of  DPDA  of  PL/ 1  primary  grammar!  25367  words. 
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B.3.8  State  Access  Structures!  state_access_l ist_struct 

state_access_l 1st 


The  state  access  structures  contain  an  entry  for  each 
state  in  the  CFSM.  For  each  state*  a  threaded  list  is 
established,  the  list  containing  all  states  that  access  the 
state  in  question  on  a  single  transition.  The  list  was 
useful  in  debuaging  the  system  and  plays  an  important  role 
in  the  CLR(k)  algorithm.  Since  the  LALR( k)  algorithm  was 
replaced  in  favor  of  a  simpler  look-ahead  scheme,  the 
structures  serve  no  functional  purpose  in  the  current 
system. 


state  access  list  struct  is  the  structure  that  bounds 
the  threaded  list  ( state_access_list)  of  accessing  states. 
sal_state_start_loc 

An  index  into  state_access_l ist  indicating  the 
location  of  the  first  state  in  the  list. 

sal_state_end_loc 

An  index  into  state_access_list  indicating  the 
location  of  the  last  state  in  the  list. 


state  access  list  is  the  threaded  list  of  accessing 
states . 

sal_state 

The  number  of  the  accessing  CFSM  state, 
sa l_next_s  ta t e_loc 

An  index  into  state_access_list  indicating  the 
location  of  the  next  state  in  the  list. 

Equals  zero  if  the  current  entry  is  the  last 
entry  in  the  list. 


Maximum  storage  requirements  of  structures  during 
computation  of  DP DA  of  PL/I  primary  grammar*  8653  words. 


8.3. 9  look  Bark  Structures*  look_back_states_struct 

look_back_states 

The  look  back  structures  contain  an  entry  for  each 

state  in  the  CFSM.  For  each  CFSM  state,  S,  containing  apply 

transitions,  tne  structures  form  a  list  of  states,  T, 

satisfying  the  following  condition* 

A  path  of  symbol  transitions  exists  from  S  to 
T  which  matches,  symbol  for  symbol,  the 
alternative  applied  in  exactly  one  of  the 
apply  transitions  of  S. 

The  list  is  used  in  converting  CFSM  apply  transitions 
into  DP DA  apply  states.  The  structures  are  created  in  two 
steps.  In  compute_cf sm,  the  first  four  elements  of 
look_back_states  are  set.  In  convert_c fsm_to_dpda,  the 
internal  procedure,  compute_look_back,  is  invoked  to  create 
look_back_states_struct  and  to  set  the  last  element  of 
look_back_states,  resulting  in  the  desired  threaded  list. 

look  hark  states  struct  is  the  structure  that  bounds 
the  threaded  list.  For  a  CFSM  state  containing  no  apply 
transition,  both  elements  are  zero. 

lb_f irst_top_state_loc 

An  index  into  look_back_states  indicating  the 
location  of  the  first  state  in  the  list. 

lb_last_to,r  ,state_loc 

An  index  into  look_bac Estates  indicating  the 
location  of  the  last  state  in  the  list. 
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is  the  threaded  list  of  states 


lb_top_state 

The  number  of  a  CFSM  state  satisfying  the 
condition  previously  established. 

lb_.ru  le 

The  numL8r  of  the  BNF  rule  associated  with  the 
path  between  T  and  S. 

lo_alt 

The  number  of  the  alternative  of  lb_rule 
associated  with  the  path  between  T  and  S. 

lb_dest ination_state 

The  state  that  is  reached  upon  taking  the 
transition  from  lb_top_state  under  the 
non-terminal  defined  in  lb_rule. 

lb_.next_top_state_.loc 

An  index  into  1  ook_back_states  indicating  the 
location  of  the  next  state  in  the  listl  equals 
zero  if  the  current  entry  is  the  last  entry  in 
the  list. 


Maximum 

computation  of 


storage  requirements  of  structures  during 
OPDA  of  PL/I  primary  grammar*  23460  words. 


B.3.  10 


t 


init ia l_dpde_struct 

id_read_states 

id_apply_states 

id_l ook_ahead_states 

id_read_transitions 

id_a pply_transitions 

id_l cok_ahead_transit ions 


The  initial  DDDA  structures  contain  the  non-opt lmized 
DPDA.  Though  the  currently  configured  DPDA  could  actually 

be  used  for  language  recognition,  optimizations  will 
eventually  be  performed  on  the  structures,  resulting  in 
significant  improvements  In  space  and  time  efficiency.  The 
initial  DPDA  structures  are  created  by  convert_cfsm_to_dpda 
and  look_ahead,  and  are  used  by  optimize_dpda  and 
1 i s_d  e  bug_mon i t  or . 


contains 


information  on  the 


structure  of  the  DPDA. 
ld_k_max 

The  maximum  number  of  look-ahead  symbols 
required  for  the  grammar. 

id_n_read_states 

The  number  of  read  states  in  the  DPDA. 
id__n_apply_states 

The  number  of  apply  states  in  the  DPDA. 
ld_n_look_ahead_states 

The  number  of  look-ahead  states  in  the  DPDA. 
id_n_read_transltions 

The  number  of  read  transitions  in  the  DPDA. 
id_n_a  oply_translt ions 

The  number  of  apply  transitions  in  the  DPDA. 
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id_n_l ook_ahead_transi tions 

_The  number  of  look-ahead  transitions  in  the 
DPDA. 


Id  states  contains  an  entry  for  each  read  state  in 

the  DPDA. 

id_read_n .transitions 

The  number  of  read  transitions  associated  with 
the  current  state. 

id_read_f irst_transition_index 

An  index  into  id_read_trans itions  indicating 
the  location  of  the  first  read  transition  of 
the  current  state. 


id  «ppiv  states  contains  an  entry  for  each  apply  state 
of  the  DPDA. 

id_apply_rule 

The  number  of  the  BNF  rule  that  is  to  be 
applieo . 

id_applv_alt 

The  number  of  the  alternative  of  id_apply_rule 
that  is  to  be  applied. 

i d_a  pp 1 y_n_to_pop 

The  number  of  states  that  will  be  popped  from 
the  DPDA  state  stack  when  the  alternative  is 
appl  led . 

id_a pply_n_top_states 

The  number  of  top  state  (look-back  state) 
transitions  associated  with  the  current  apply 
state. 

id_apply_f irst_top_s tat e_ index 

"*An  index  into  id_apply_transitions  indicating 
the  location  of  the  first  apply  transition  of 
the  current  state. 
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Id  look  ahead  -Slale.5.  contains 


an  entry  for  each 


look-ahead  state  of  the  DPDA. 

id_look_ahead_n_transitions 

"""The  number  of  look-ahead  transitions 

associated  with  the  current  state. 

id_look_ahead_f irst_transi tion_index 

An  index  into  id_l ook_ahead_transitions 

indicating  the  location  of  the  first 
look-ahead  transition  of  the  current  state 


Id  T-flfld  ran*;  1 1 Ions  contains  an  entry  .or  each  read 
transition  of  the  DPDA  (two  entries  for  a  special  lexical 
encoding).  The  transitions  for  each  read  state  are 
contiguous  within  the  structure. 
id_read_symbol 

The  current  transition  symbol.  If  the  grammar 
is  primary,  then  the  symbol  is  the  number  of  a 
key-symbol  or  lexical  non- terminal .  If  the 
grammar  is  lexical,  then  the  symbol  is  a 
terminal  character  or  a  special  lexical 
encoding,  as  indicated  by  the  value  of 
id_read_destination .  A  special  lexical 
encoding  is  encoded  as  two  transitions.  For 
the  first  transition,  the  lower  bound  encoding 
character  is  entered  as  the  symbol,  and 
id_read_destination  and 

id_read_destination_type  are  0  and  "OO^b, 
respectively.  For  the  second  transition,  the 
upper  bound  encoding  character  is  entered,  and 
id_read_dest  if'btion  and 

id_read_destination_type  are  set  as 
appropriate  for  the  destination  state  of  the 
transition. 

id  read_destination 

The  DPDA  state  that  is  the  destination  for  the 
current  read  transition!  equals  zero  if  the 
first  transition  of  a  special  lexical  encoding 
pair. 


id_.read_destinatiorL.type 

A  bit  string  indicating  the  type  of 
destination  state  for  the  current  read 
transition* 

4,00"b  ->  read  state 
J,OJMb  ->  apply  state 
"lO^b  ->  look-ahead  state 
•“  1  l**b  ->  not  used 


Id  apolv  transitions  contains  an  entry  for  each  apply 
transition  of  the  DPDA.  The  transitions  for  each  apply 
state  are  contiguous  within  the  structure. 
id_apply_top_state 

The  top  state  for  the  current  apply 

transition . 

id_apply_destination 

The  DPDA  state  that  is  the  destination  for  the 
current  apply  transition. 

id_apply_destination_type 

A  bit  string  indicating  the  type  of 

destination  for  the  curreat  apply  transition* 

"00"b  ->  read  state 
*OI**b  ->  apply  state 
'MO^b  ->  look-ahead  state 
■"  ll“b  ->  not  used 


l<j  look  ahead  transitions  contains  an  entry  for  each 
look-ahead  transition  of  the  DPDA.  The  transitions  for  each 
look-ahead  state  are  contiguous  within  the  structure. 
id_l ook_ahead_symbo 1 s 

The  look-ahead  symbols  for  the  current 
transition.  If  the  grammar  is  primary,  then 
the  symbols  may  be  key-symbols  or  lexical 
non-terminals.  If  the  grammar  is  lexical,  then 
each  symbol  is  either  a  terminal  character  or 
the  number  of  a  special  lexical  encoding,  as 
indicated  by  id_look_ahead_special_lexical . 
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id_l ook_ahead_dest inat ion 

The  DPDA  state  that  is  the  destination  for  the 
current  look.ahead  transition. 

id_look_ahead_destination_type 

A  bit  string  indicating  the  type  of 
destination  state  for  the  current  look-ahead 
transition* 

"OO^b  ->  read  state 
"Ol^b  ~>  apply  state 
"JO^b  ->  not  used 
" t  J  *b  ->  not  used 


id_look__ahead_special_lexical  ( i ) 

A  bit  indicating  whether  the  symbol  in  the 
i-th  position  of  the  current  look-ahead 
transition  is  a  special  lexical  encoding* 

«0Mb  ->  special  lexical  encoding 
ul"b  ->  not  special  lexical  encoding 


Maximum  storage  requirements  of  structures  during 
computation  of  D°DA  of  PL/I  primary  grammar*  33044  words. 
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B.3.1J  Look-Ahead  Structurasi  look_ahead_contour_struct 

genera tion_con tour .struct 


The  look-ahead  structures  are  created  during  the 
process  of  resolving  each  inadequate  state  of  the  CFSM. 


look  ahead  contour  struct  is  built  up  one  level  (a 
maximum  of  three)  for  each  depth  of  look-ahead  required. 
lac_n_contour_states 

The  number  of  states  in  the  current_contour . 
lac_state_index 

An  ind«x  into  look_ahead_contour  indicating 
the  location  of  the  state  from  which 
look-ahead  is  being  performed  at  the  current 
level . 

lac_transi tion_index 

An  index  indicating  the  transition  within  the 
state  identified  by  lac_state_index  from  which 
look-ahead  is  being  performed  at  the  current 
level . 

look_ahead_contour 


The  contour  states  for 

the 

current 

level . 

aflQflnflUao-LQQtour.  .sirucfc 

is 

used 

to  generate  a 

look-ahead  contour.  First 

the 

contour 

is  generated  in 

generation_contour_structf 

and 

then 

copied  into 

look_ahead_contour  at  the  appropriate  level. 


generation_participant(  i ) 

A  bit  indicating  whether  CFSM  state  i  has  been 
a  participant  in  the  generation  process. 

M0U  ->  not  a  participant 
■u)l,b  ->  a  participant 

generat ion_ccntour 

The  contour  of  generation  states. 
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Maximum  storage  requirements  of  structures  during 
computation  of  DPDA  of  PLyi  primary  grammar*  229  words. 
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B.3.12  Final  DPDA  Structures* 


f lnal_dpda_struct 

fd_read_states 

fd_a pply_states 

fd_l  ook_ahead_states 

fd_read_trensitions 

fd_apply_trt'nsitions 

fd_look_ahead_transitions 

fd_key_symbols_struct 

fd_key_syrr.ools 


The  final  DPDA  structures  contain  the  optimized  DPDA. 
In  mul tics_dpda_optimization  we  further  optimize  the  DPDA  by 
"fine  tuning"  it  for  the  Multics  System.  However,  the 
present  structJres  constitute  the  essential  functional 
output  of  the  system  and  are  therefore  the  last  of  the  major 
system  data  structures  to  be  discussed  in  detail.  Given 
these  structures,  it  is  a  straightforward  process  to 
implement  a  procedure  to  "fine  tune"  the  structures  to  a 
particular  computing  environment. 


xioal  doda-Sixuci  contains  information  on  the 
structure  of  the  DPDA. 
fd_gramma retype 

A  bit  indicating  the  type  of  grammar  from 
which  the  DPDA  was  computed* 

"0"b  ->  lexical 
"!"b  ->  primary 

fd_k_max 

The  maximum  number  of  look~ahead  symbols 
required  for  the  grammar. 

fd_read_states_offset 

The  offset  within  the  DPDA  segment  of  the 
start  of  the  read  states. 


fd_apply_states_o  ffset 

The  offset  within  the  DPDA  segment  of  the 

start  of  the  apply  states. 

fd_look_ahead_states_o ffset 

The  offset  within  the  DPDA  segment  of  the 

start  of  the  look-ahead  states. 

fd_read_transit ions_o ffset 

The  offset  within  the  DPDA  segment  of  the 

start  of  the  read  transitions. 

f d_a pp ly_t  r an  s i t i ons_o  f fs  e  t 

The  offset  within  the  DPDA  segment  of  the 

start  of  the  apply  transitions. 

fd_look_ehead_transitlons_o  ffset 

The  offset  within  the  DPDA  segment  of  the 

start  of  the  look-ahead  transitions. 

fd_non_lexical_chars 

If  the  grammar  is  lexical,  then  this  element 
contains  the  <non_lexical>  characters  for  the 
lexical  parser.  If  the  grammar  is  primary, 
then  this  element  is  null. 

f d_n_k  ey_s ymbo 1 s 

If  the  grammar  is  primary,  then  this  element 
contains  the  sum  of  the  number  of  key-symbols 
and  the  number  of  non-terminals  that  are 
<lexical_non_terminal>.  If  the  grammar  is 
lexical,  then  this  element  is  zero. 

f d-key_symbols_st  rue  t_o  ffset 

If  the  grammar  is  primary  then  this  element 
contains  the  offset  within  the  DPDA  segment  of 
the  start  of  fd_key_symbols_struct.  If  the 
grammar  is  lexical,  then  this  entry  is  zero. 

fd_n_key_chars 

If  the  grammar  is  primary,  then  this  element 
contains  the  number  of  characters  in 
fd_key_symbols.  If  the  grammar  is  lexical, 
then  this  element  is  zero. 

fd_key_symt  ols_o  ffset 

If  the  grammar  is  primary,  then  this  element 
contains  the  offset  within  the  DPDA  segment  of 
the  start  of  fd_key_symbols . 
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fd_next_dpda_off set 

If  the  grammar  is  primary  and  if  the  DPDA 
segment  also  contains  a  parser  for  the  lexical 
grammar ,  then  this  element  contains  the  offset 
within  the  DPDA  segment  of  fd_dpda_struct  for 
the  lexical  DPDA. 


fd  read  states  contains  an  entry  for  each  read  state  of 
the  DPDA. 

fd_read_n_transitions 

The  number  of  read  transitions  associated  with 
the  current  state. 

fd_read_f lrst_transi tion_index 

An  index  into  fd_read_transitions  indicating 
the  location  of  the  first  read  transition  of 
the  current  state. 


fd  apply  states  contains  an  entry  for  each  apply  state 
of  the  DPDA. 

fd_apply_rule 

The  number  of  the  BNF  rule  that  is  to  be 
appl led . 

fd_a pply_alt 

The  number  of  the  alternative  of  fd_apply_rule 
that  is  to  be  applied. 

f d_a pp 1 y_n_to_ pop 

The  number  of  states  that  will  be  popped  from 
the  DPDA  stack  when  the  alternative  is 
applied. 

fd_a pply_semantics 

A  bit  indicating  whether  semantics  is 
associated  with  BNF  rule  fd_apply_rule  in  the 
LIS  language  Definition* 

"CP'b  ->  no  semantics 
"lMb  ->  semantics 

fd_n_top_states 

The  number  of  top  state  (look  back  state) 
transitions  associated  with  the  current  apply 


-  274  - 


state 


fd_a pply_f irst_top_s tat e_ index 

An  index  into  fd_apply_transitions  indicating 
the  location  of  the  first  apply  transition 
(top  state  translticn)  of  the  current  state. 


fd  look  ahead  states  contains  an  entry  for  each 
look_ahead  state  of  the  DPDA. 

fd_look_ahead_n_tr ansi tlons 

The  number  of  look-ahead  transitions 

associated  with  the  current  state. 

fd_look_ahead_f irst_transition_index 

An  index  into  fd_look_ahead_transitions 

indicating  the  location  of  the  first 
look-ahead  transition  of  the  current  state. 


fd  read  transitions  contains  an  entry  for  each 
transition  of  the  DPDA  (two  entries  for  a  special  lexical 
encoding).  The  transitions  for  each  read  state  are 
contiguous  within  the  structure. 
fd._read_symboi 

The  current  transition  symbol.  If  the  grammar 
is  primary,  then  the  symbol  is  the  number  of  a 
key-symbol.  If  the  grammar  is  lexical,  then 
the  symbol  is  a  terminal  character  or  a 
special  lexical  encoding,  as  indicated  by  the 
value  of  fd_read_destination.  A  special 
lexical  encoding  is  encoded  as  two 

transitions.  For  the  first  of  such 
transitions,  the  lower  bound  encoding 
character  is  entered  as  the  symbol,  and 
fd_read_destination  and 

fd_read_destination_type  are  0  and  “OO^b, 
respectively.  For  the  second  of  such 

transitions,  the  upper  bound  encoding  is 
ontered  and  fd_read_destination  and 
fd_read_dest.lnation_type  are  set  as 

appropriate  for  the  destination  state  of  the 
transition. 
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fd_read_dest ination 

The  DPDA  state  that  is  the  destination  for  the 
current  read  transition. 


fd  read_destination_tyoe 

A  bit  string  indicating 
destination  state  for  the 
transition* 

u00Mb  ->  read  state 
"Ol^b  ->  apply  state 
"!0"b  *•>  look-ahead  state 
«!l“b  ->  not  used 


the  type  of 
current  read 


fH  arm  1 v  transitions  contains  an  entry  for  each  apply 


transition  of  the  DPDA.  The  transitions  for  each  apply 


state  are  contiguous  within  the  structure. 


fd_aDply_top_state 
The  top  state 
transition . 


for  the  current  apply 


fd  a pply_dest ination 

The  DPDA  state  that  is  the  destination  for  the 
current  apply  transition. 


fd  apply_destination_type 

A  bit  string  indicating 
destination  state  for  the 
transition* 

M00Mb  ->  read  state 
•»01"b  ->  apply  state 
"10Mb  ->  look-ahead  state 
u  |  i*b  ->  not  used 


the  type  of 
current  apply 


fri  look  thgri^  transitions  contains  an  entry  for  each 
look-ahead  transition  of  the  DPDA  (two  entries  if  at  least 
one  symbol  of  the  transition  is  a  special  lexical  encoding), 
ihe  transitions  for  each  look-ahead  state  are  contiguous 
within  the  structure. 


-  276  - 


fd  look_ahead_symbols 

A  sequence  of  symbols  indicating  the 
look-ahead  symbol  for  the  current  transition. 

If  the  grammar  is  primary ,  then  each  symbol  i 
the  number  of  a  key-symbol.  If  the  grammar  is 
lexical,  then  the  symbols  a*--  either  all 
terminal  characters,  or  contain  at  least 
special  lexical  encoding,  as  indicated  by  the 

vSlSes  of  fd_look_ahead_desti  nation  and 
fd  look_ahead_destination_type  In  the  case 

that  the  symbols  contain  at  least  one  special 
lexical  encoding, 

fd  look_ahead_trans  itions  entry  are  required, 
th«  first  such  entry  being  indicated  by  values 
of  0  and  »00«b  for  fd_loolc_ahead_destinatlon 
anCl  fd  l ook_ahead.de stination. type, 

respectively.  For  those  symbols  of  such  a 
transition  that  are  not  special  lexical 
encoding  symbols  (i.e.  that  are  Pd 

characters) ,  the  terminal  character  is  entered 

in  the  appropriate  position  in  the  first  entry 
and  the  corresponding  position  in  the  seco 
entrv  is  set  with  the  blank  character.  For 
those  symbols  of  such  a  transition  that  are 
special  lexical  encodings,  the  lower  bound 
encoding  character  is  entered  in  the 

appropriate  position  in  the  first  entry  and 
the  upper  bound  encoding  character  is  entered 
in  the  corresponding  position  in  the  second 

entry. 

fd  look  ahead.destinat ion 

”lhe  DPDA  state  that  is  the  destination  for  the 

current  look-ahead  transition. 

fd  look  ahead_de5tination_type 

A  bit  string  indicating  the  type  of 

destination  state  for  the  current  transition! 
*00"fc  ->  read  state 
wolwb  ->  apply  state 
*10«b  ->  not  used 
*  1  i-*b  ->  not  used 


fH  w  svi^i«  struct  contains  a  separate  entry  for 
each  unique  (-escapes-  resolved)  key-symbol  and  lexical 
non-terrainal  in  the  primary  grammar. 
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fd_key_start 
An  index 
location 
key-symbol 


into  f d_key_symbol  indicating  the 
of  the  start  of  the  current 
or  lexical  non-terminal. 


fd_key_length 

The  ebrolute  value  of  this 
of  the  key-symbol  or  lexical 

the  value  of  the  entry  is  greater  .  f 

the  entry  is  a  key-symbol.  If  the  value  of 
the  entry  is  less  than  zero,  the  entry  is  a 
lexical  non-terminal. 


entry  is  the  length 
non-terminal.  If 
than  zero, 


fri  ftrfY  symbols  is  the  character  string  which  is  a 
concatenation  of  all  key-symbols  and  lexical  non-terminalSc 

Maximum  storage  requirements  of  structures  during 
computation  of  DPDA  of  PL/ 1  primary  grammar*  7400  words. 
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B.3. 1 3  Mill t.lra  nPDA  Structures 


The  Multics  DPDA  structures  are  an  optimized  version  ov 
the  final  DPDA  structures  which  are  "fine  tuned"  for  the 
Multics  system.  As  such,  they  represent  no  functional 
advancement  over  the  final  DPDA  structures  and  therefore  are 
not  discussed  in  detail.  The  type  of  fine  tuning  done  for 
the  Multics  environment  made  a  significant  impact  on  both 
time  and  space  efficiency,  and  analogous  optimizations  would 
be  appropriate  for  other  computing  environments. 

Maximum  storage  requirements  of  structures  during 
computation  of  DPDA  of  PL/I  primary  grammar*  1717  words. 
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Appendix  C 


LlS-Apolicatlon*  016535 


c . 1  Introduction 


The  Common  Base  Language  is  being  designed  by  the 
Computation  Structures  Group  of  MITvs  Project  MAC  to  serve 
as  the  intermediate  target  representation  language  (abstract 
language)  of  a  particular  formal  semantic  system.  According 
to  Dennis  (Den  72)* 


When  the  meaning  of  algorithms  expressed  in 
some  programming  language  has  been  specified 
in  precise  terms,  we  say  that  a  Iflrmal 
semantics  for  the  language  has  been  given.  A 
formal  semantics  for  a  programming  language 
generally  takes  the  form  of  two  sets  of  rules 
—  one  set  being  a  translator,  and  the  second 
set  being  an  interpreter.  The  translator 
specifies  a  transformation  of  any  well-formed 
program  expressed  in  the  source  language  (the 
concrete  language)  into  an  equivalent  program 
expressed  in  a  second  language  -  the  abstract 
language  of  the  definition.  The  interpreter 
expresses  the  meaning  of  programs  in  the 
abstract  language  by  giving  explicit 
directions  for  carrying  out  the  computation  of 
any  well-formed  abstract  program  as  a 
countable  set  of  primitive  steps. 


In  this  appendix,  we  discuss  the  design  and 
implementation  of  a  translator  from  a  simple  block 
structured  language  into  the  Common  Base  Language.  The 


presentation  assumes  a  familiarity  with  the  Base  Language 
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(Den  72) «  though  in  Part  C.3  we  define  those  Base  Language 
Primitives  that  are  used  in  the  translation. 

In  adding  to  the  theoretical  development  of  the  Base 
Language,  the  real  contribution  made  by  the  present  effort 
is  probably  that  of  formally  specifying  the  translation  of 
common  higher  level  language  constructs  into  the  Base 
Language  primitives,  and  not  necessarily  the  development  of 
the  translator  itself.  However,  that  if  the  Base  Language 
is  ever  to  escape  the  realm  of  pure  theory  and  penetrate  the 
world  of  actual  programming  language  development  and 
implementation,  the  translator  from  a  particular  concrete 
language  into  the  Base  Language  will  certainly  have  to  be 
among  the  first  of  priorities.  Thus  ou-  attitude  is 
roughly* 

Given  that  the  Base  Language  is  in  a  very 
early  stage  of  development,  and  even  though 
much  progress  both  in  hardware  and  in  software 
must  take  place  before  it  will  enjoy 
reasonable  acceptance  as  an  implementation 
device,  can  we  nevertheless  do  something 
meaningful  towards  an  implementation  given 
what  has  been  done. 

Assuming  that  a  primitive  Base  Language  interpreter 
will  eventually  be  implement  on  Multics,  the  translator 
presented  here  can  be  extended  and  combined  with  the 
interpreter  so  as  to  realize  a  complete  translator 
implementation  for  a  legitimate  programming  language. 
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C.2  The  P16535  Language. 


The  concrete  language  that  we  Implemented  is  called 
pl6535.  pl6535  is  a  simple  block  structured  language 
similar  to  the  language  specified  bv  Flinker  (Fli  72). 
However,  FI  inker's  language  defines  ambiguous  expressions, 
and  herein  lies  the  primary  difference  between  his  language 
and  pl6535.  The  BNF  specification  of  the  syntax  of  pl6535 
is  given  below. 


The  Syntax  Qf-Ql65.i5 


(  1 ) 
(2) 

(3) 

(4) 

(5) 

(6) 


<primary_non_terminal>  **» 

<procedure>  ! 

<procedure>  *** 

<procedi?re_head>  <body> 
<procedure_end>  ! 


<procedure_head> 


3  I  = 

<identif ier>  *  PROCEDURE 
(  <var iable_l ist>  )  I  ! 
<identifier>  *  PROCEDURE  l 


<var iabl e_l ist>  «** 

<variable_list>  ,  <identifier> 
<identifier>  ! 

<body>  *  ** 

<body>  <statement>  ! 
<statement>  ! 

<procedurt_end>  *** 

END  <identifier>  I  ! 


Statements 

(7)  <statement>  «»« 

<label>  <statement>  ! 
<assignment_statement>  ! 
<conditional_statement>  ! 
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<return_statement>  ! 
<goto_statement>  ! 

<dec lare_statement>  J 
<procedure>  l 

(8)  <label>  *«  = 

<ldentifier>  *  ! 

(9)  <a  ssignmer\t_statement>  n- 

<identif ier>  =  <identifier> 

(  <variable_list>  )  *  ! 

<identif ier>  55  <expression>  *  i 

(10)  <expression>  «*  = 

<expression>  +  <term>  ! 
<expression>  -  <term>  1 
<term>  ! 

(11)  <term>  **  = 

<term>  *  <factor>  ! 

<term>  /  <factor>  ! 

<factor>  ! 

(12)  <factor>  **= 

(  <expression>  )  I 
<identifier>  ! 

<integer>  ! 

(13)  <cond it ional_statement>  »»  = 

IF  <equaii ty>  THEN  <statement> 

(14)  <equal i ty>  **= 

<expression>  =  <expression>  ! 

(15)  <return_statement>  **  = 

RETURN  (  <identifier>  )  I  ! 

(16)  <goto_statement>  u* 

GOTO  <identifier>  »  ! 

(17)  <declare_statement>  «*  = 

DECLARE  (  <var iable_list>  )  *  i 


lexical  Constructs 

(18)  <lexical_non_terminal>  *«* 

<identifier>  5 
<integer>  ! 

(19)  <identif ier>  «*= 

<identifier>  a->z 
<identifier>  A->Z 
<identifier>  0->9 
a->z  ! 

A->Z  i 

(20)  <integer>  *«* 

<integer>  0->9  I 
0->9  ! 


-  294  - 


The  first  BNF  rule  specifies  that  the  goal  symbol  of 
the  primary  grammar,  <primary_non_terminal>,  is  defined  to 
be  a  <procedure> .  BNF  rule  18  defines  <identifier>  and 
<integer>  to  be  <lexical_non_terminal>s,  so  that  LIS  will 
generate  a  separate  parser  (the  lexical  parser)  for 
recognizing  these  constructs. 


In  addition  to  the  context-free  restrictions  placed  on 
the  language  by  the  BNF  specification,  we  have  the 
additional  (context  sensitive)  restrictions* 

a.  No  external  <procedure>  calls  are  allowed, 
with  the  exception  of  recursion. 

b.  Non-local  goto^s  are  not  implemented. 

c.  All  <identif ier>s  must  be  declared  or  defined 
as  follows* 

i.  Variables  which  occur  in 

<expression>s,  as  arguments  in 

<procedure>  calls  (BNF  (9*1)),  or  in 

<return_statement>s,  must  be 

explicitly  declared  in  a 
<declare_statement>  or  implicitly 
declared  as  parameters  in  a 
<procedure>  definition  (BNF  (3*1)). 

ii.  All  <identif ier>s  referenced  by 

<goto_statement>s  must  be  defined  by 
their  occurrence  as  a  <label>  in  the 
same  block. 

iii.  All  <procedure>s  referenced  in  a 
given  block  must  be  defined  in  that 
block  or  in  a  lexicographically 
enclosing  block. 
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d.  The  terminal  symbols  of  the  primary  grammar 
( key-symbols)  are  reserved. 
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The  set  of  Base  Language  primitives  adopted  is 
substantially  that  defined  by  Dennis  (Den  72)  and  used  by 
Flinker  (Fll  72).  Noting  that  Flinker  used  a  single-address 
form  of  the  delete  Instruction  while  Dennis  used  a 
two-address  form,  we  decided  that  both  forms  should  be 
accepted,  since  both  are  functionally  attractive,  and  since 
thetr  usage  is  unambiguous.  Two  new  primitives  have  been 
added  to  solve  specific  problems*  1 f goto  to  allow 
conditional  transfer,  and  assign  to  allow  straightforward 
translation  of  asslgnment_statement>s  such  as  "a-b*41. 


The  full  list  of  primitives  used  is  as  follows* 

flSSlOQ  a,b 

The  value  of  "a"  is  copied  and  the  copy 
becomes  the  value  of  “b". 

£OOSi  P.q 

Construct  an  elementary  object  called  "q" 
having  as  its  value  tho  constant  “p4*. 

create  p 

Create  an  elementary  object  called  "p“  having 
no  value. 

d£l£l£  P 

The  selector  “p*  no  longer  exists  in  the  local 
structure,  and  all  parts  of  the  object  which 
do  not  share  will  also  cease  to  exist. 

deleta  P.n 

Like  delete  p,  except  that  only  the  “rt*  branch 
of  the  structure  "p44  is  deleted. 

gala  p 

Take  the  instruction  selected  by  "p44  as  the 
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next  to  be  executed. 


1  f goto  a,b,p 

if  uau  B  nb^  then  goto  "pM. 


link  a, b, x 

The  “V  branch  of  "a*  is  set  to  share  the 
object  at  **x" . 

nma.  p«q 

The  program  structure  up“  is  linked  to  the 
current  local  structure  under  the  name  MqM. 

afifilk  P.q 

The  instruction  following  the  apply  is  made 
dormant,  pending  the  return  from  the  called 
program  "p“ .  A  local  structure  is  created  for 
upM,  containing  a  link  to  the  argument 
structure  "q*1.  The  next  instruction  will  be 
the  zero-th  of  upM. 

UlLuhq 

The  local  structure  for  the  current  program  is 
deleted  and  control  returns  to  the  calling 
program. 

select  a ,  b , x 

"x"  is  set  to  sh<*re  the  object  at  the  "b" 
branch  of  Mau. 

( add,  sub,  mult.  cULkldfl)  a,b,x 

Perform  “a"  op  MbM  and  store  the  result  in 

"  x" . 


In  all  of  these  instructions,  should  the  target 
structure  or  object  not  already  exist,  it  will  be  created. 
Readers  wishing  more  graphic  explanations  of  these  primitive 
instructions  are  referred  to  the  papers  oy  Dennis  (Den  72) 
and  Flinker  (Fli  72) . 
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C . 4  IhflL_SlEUCture  Qf  the  ol6535  Translator 


The  Language  Implementation  System  was  used  to 
implement  both  the  lexicsl  and  the  primary  parsers  of  the 
pl6535  translator.  The  semantics  of  the  translation  was 
specified  using  PL/I,  and  the  translator  was  implemented  as 
a  two  pass  compiler.  The  first  pass  determines  the 
environment  requirements  for  the  <procedure>s  a  submitted 
pl6535  program,  while  the  second  pass  uses  this  environment 
information  in  generating  the  actual  Base  Language 
translation  of  that  program.  The  Pass-)  and  Pass-2 
semantics  are  specified  in  detail  in  Sections  C.5  and  C.6, 
respectively. 


Due  to  the  current  absence  of  error  recovery  procedures 
on  LIS,  the  only  pl653f»  programs  submitted  for  translation 
were  error  free  pr^grar. s. 
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c .  5  The  oJj6525-.l£flnslatQr,»  Pass-1 


In  the  literature  to  date  on  the  Base  Language,  the 
approach  taken  in  handling  non-local  variable  references  in 
block  structured  languages  has  been  to  pass  these  non-local 
variables  as  arguments  to  those  procedures  in  which  they  are 
referenced.  This  may  involve  several  levels  of  passing,  and 
in  attempting  to  translate  such  languages,  one  has  two  basic 
choices* 

a.  Implement  a  one  pass  translator,  and  generate 
code  "on  the  fly".  This  involves  the  chaining 
of  all  procedures  referencing  a  particular 
variable.  Then,  when  the  appropriate 
declaration  for  that  variable  has  been 
encountered,  its  chain  is  traversed,  and  the 
appropriate  link  and  select  instructions  are 
inserted  in  the  code  already  generated. 

b.  Implement  a  two  pass  translator,  and  use  the 

first  pass  to  determine  the  environment 

requirements  of  each  procedure.  This  means 
that  for  each  procedure,  it  will  be  determined 
during  the  first  pass  which  non-local 

variables  must  be  passed  to  that  procedure 
when  it  is  called.  The  second  pass  then  uses 
this  information  to  generate  code  in  such  a 
way  that  chaining  and  insertion  of 
instructions  is  avoided. 


The  second  of  the  above  methods  is  by  far  the  simpler, 
and  is  the  one  that  we  have  chosen  in  our  implementation. 
It  avoids  the  complex  chaining  strategy  associated  with  the 
first  method,  though  as  will  be  seen  when  Pass-2  is 
discussed,  a  simplified  version  of  the  chaining  strategy  has 
been  retained  for  the  treatment  of  <label>s. 
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C.5.1  Pass-1  Date  Structural 


Several  data  structures  are  used  by  Pass-1 
generating  the  <procedure>  environments. 


text  reference  stack. 

1  text_reference_stack( top) 

based (text_reference_stack_ptr ) , 
2  construct  char(JO)  unaligned, 

top  fixed  binary(l7), 

text_ref erence_stack_ptr  pointer , 


text_reference_stack  is  a  stack  containing  the  lexical 
constructs  recognized  by  the  lexical  parser.,  top  marks  the 
top  of  the  stack,  and  construct  contains  the  actual  spelling 
of  the  lexical  construct  for  a  particular  stack  entry.  The 
reader  should  refer  to  Appendix  A  for  a  description  of  how 
the  text_reference_stack  is  used  to  gain  access  to  the 
lexical  constructs. 


symbol -lab la 

1  symbol_table( 10) , 

2  proc_name_st 

2  symbol _count 

2  symboj._entry(20) , 

3  symbol_spell 

3  declared 

current_level 


charf 10) , 

fixed  binary(17), 

chart  10) , 
bittl ), 

fixed  binary(17). 


Our  implementation  allows  a  nesting  of  a  maximum  of  10 
block  levels,  the  current  level  being  indicated  by 
current_level .  During  the  parse  of  a  pl6535  program,  and 
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during  the  execution  of  that  program,  only  one  block 
associated  with  each  level  may  be  active  at  a  giver.  Instant, 
symbol.table  represents  the  symbol  table  for  a  maximum  of  10 
blocks  that  are  simultaneously  active,  and  Identifies  the 

active  blocks. 


rtithin  symbol.table,  the  entries  have  the  following 
meaning. 

proc_name_ste  ^  the  <pr0cedure>  for  which  the 
symbol  table  has  been  established 

symbolic o^ber  ^  symbois  entered  into  tht  table 

for  proc_name_st. 


symbol__entry  ,  , 

Each  entry  in  the  symbol 
proc_name_st  has  two  parts* 


table  for 


symbol_spe 11 

The  symbol  name. 

declared 

A  bit  indicating  whether 
declared  in  proc_name_st* 
*0Mb  ->  not  declared. 
iii-hk  ->  declared. 


symbol^ spell  is 


In  the  semantics  for  Pass-1,  we  have  implemented  a 
procedure  for  making  entries  into  the  symbol_table.  When  a 
non-declaration  entry  is  made  in  the  symbol.table* 


The  symbol  table  for  current_level  is  scanned 
to  see  if  the  entry  has  already  been  made.  If 
so,  the  procedure  returns.  If  not,  ■ the  entry 
is  made  and  its  declared  bit  is  set  to  0  b. 
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When  a  symbol  declaration  is  made  in  the  symbol_table* 

The  symbol  table  far  current_leve  1  is  scanned 
to  see  if  the  entry  has  already  been  made.  If 
so,  the  procedure  insures  that  the  declared 
bit  of  the  entry  is  set  to  Ml“b.  If  the  entry 
has  not  been  made,  the  procedure  makes  the 
entry  and  sets  its  declared  bit  to  “l^b. 


yar-Ilal 

var_l ist(20) 

char< 10) , 

var_count 

fixed 

binary( 17), 

var_list 

is 

used  to 

build  up  the  names  of 

<ident if ier>s  that  occur  in  <var iabie_l ist»s  .  The  number  of 


entries  in  var_list  is  indicated  by  var._count. 


environment  needed  _tHfl£ 


1  environments  eeded_ 

tree(  10), 

2 

proc_name_env 

char( 10) , 

binary( 1 7) , 

2 

env_count 

fixed 

2 

env_entry(20) 

char< 10), 

binary( 17), 

proc_c ount 

fixed 

This 

is  the  output 

from  Pass-1. 

Our  implementation 

allows  a  maximum  of  ten  uniquely  named  <procedure>s  in  any 
pl6535  program,  and  the  environment_needed_tree  exists  so  as 
to  indicate  the  environment  requirements  of  each 
<procedure>.  The  entries  have  the  following  meaning. 
proc_name_env 

The  name  of  the  <procedure>  for  which  the 
environment  information  has  been  determined. 
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env_count 

The  number  of  variables  that  constitute  the 
environment  requirements  of  proc_name_env. 

env_entry  ,  ,  ,  , 

The  actual  names  of  the  variables  that 

constitute  the  environment  requirements  of 

proc_name_env . 

proc_count 

The  number  of  <procedure>s  in  the  pl6b35 
program  being  translated. 


In  the  following  discussion,  we  present  the  basic 
semantic  actions  required  to  determine  the  environment 
requirements.  The  discussion  is  relative  to  the  BNF 
definition  of  pl6535,  and  any  BNF  rule  not  mentioned  has  no 
semantic  action  during  this  pass.  The  listings  at  the  end 
of  the  appendix  should  be  referenced  for  details  of  the 
implementation . 

Procedure  hsad>  ( BNF-3) 

Upon  detecting  a  <procedure. head> « 

a.  An  environment_needed_tree  entry  is 

established  for  the  <procedure>. 

b.  The  <procfidure>  name  is  entered  into  the 
current^le vel  as  being  defined  (unless  this  is 
the  first  <procedure>,  in  which  case 
current_level  ■  0). 

c.  current.level  is  incremented,  and  the 

symbol_table  for  current_level  is  initiated  on 
behalf  of  the  <procedure>  name. 

d.  The  parameters  of  the  <procedure>  are  entered 
into  the  symbol_table  as  having  been  declared. 

<var labia  llst>  (BNF-4) 

Detecting  a  <var iable_l ist>  results  in  var_llst  being 
built  up  to  contain  the  variables  ( <ident if ier>s>)  in  the 
list. 
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* 


V 


T1 


<proredLrfi  end>  (  DNF-6 ) 

Upon  detecting  a  <procedure_end>* 

a.  The  symbol_table  for  current_level  is  scanned 
to  determine  if  any  symbols  are  undeclared  at 
this  level.  If  undeclared  symbols  exist,  they 
are  entered  into  the  environment_needed  list 
of  the  <procedure>  being  ended  and  are  also 
entered  in  the  symbol  table  of  the 
lexicographically  enclosing  <procedure> 
(unless  current_level  a  I). 

b.  current_level  is  decremented  by  1. 


(BNF-9) 


Upon  detecting  an  <assignment_statement>,  we  enter  the 
symbol  on  the  left  side  of  the  equal  sign  into  the  symbol 
table.  In  the  case  of  the  <procedure>  call  (BNF  (9*1)),  we 
enter  the  arguments  of  the  call  into  the  symbol  table,  as 
well  as  into  var_list. 


5.fflCtQr>  (BNF-I2) 

Upon  detecting  a  <factor>,  we  know  that  we  are  building 
up  an  <express ion> .  <identif ier>s  which  are  <factor>s  are 
entered  into  the  symbol_table  by  the  semantics  of  this  rule. 

<•  re  turn  statement  (BNF- 15) 

Upon  detecting  a  <return_statement>,  we  enter  the  name 
of  the  variable  being  returned  into  the  symbol_table . 

idficlare . statsmflQt>  ( bnf-i 7) 

Upon  detecting  a  <declare_statement> ,  we  enter  the 
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being  declared  into  the  symbol_table  as  having 


<identif ier>s 
been  declared. 


C.6  The  ol  653  5 -Translator!?  PaSSr2 


it  is  during  Pass-2  that  the  actual  translation  of  a 
pl6535  program  into  its  Common  Base  Language  representation 
takes  place. 

The  structure  of  the  Base  Language  is  such  that  no 
variables  may  be  referenced,  or  <procedure>s  called  which 
are  not  part  of  the  currently  executing  <procedure>.  Thus, 
external  references  and  calls  of  a  higher  level  language 
must  be  translated  into  non-external  (i.e.  local)  references 
and  calls  of  the  Base  Language.  These  variables  and 
<procedure>s  must  be  passed  down  to  the  referencing 
<procedure>  during  the  calling  sequence  in  the  same  manner 
as  arguments.  A  unique  naming  convention  was  adopted  for 
the  selectors  from  the  SARG  node  (created  as  the  second 
argument  of  the  apply  instruction)  and  the  $PAR  node 
(created  in  the  call  of  a  <procedure>,  with  the  same  value 
as  the  argument  of  the  apply  instruction  which  invoked  this 
<procedure>) .  In  this  convention,  arguments  to  be  passed, 
and  formal  parameters  used,  are  given  consecutive  integer 
selectors  beginning  with  It  all  environment  variables  and 
<procedure>s  are  selected  with  character  strings  spelling 
their  old  names;  and  the  value  to  be  returned  (on  the 
$ARG/$PAR  structure)  is  given  the  selector  11 S RET- . 


C.6.1  Pass-2  Data  Structures 

Several  data  structures  are  used  by  Pass-2  in 
generating  the  actual  Common  Base  Language  representation  of 
a  pi 6K35  program. 

text  reference  stack 

text_reference_stack  is  used  the  same  way  in  Pass-2  as 
it  is  in  Pass-1,  so  that  its  description  under  Pass-1  also 
applies  here. 

environment  needad.tr&a 

environment_needed_tr ee  is  the  output  from  Pass-1,  and 
represents  the  environment  requirements  for  each 
<procedure>,  which  is  necessary  in  the  generation  of  the 
Common  Base  Language  translation.  The  discussion  of 
environment_needed_.tr ee  in  Pass-1  also  applies  here. 

xan-Ltal 

var_list  is  used  the  same  way  in  Pass-2  as  it  is  in 
Pass— I,  so  that  its  description  under  Pass-1  also  applies 
here . 

label.  AlsL 

1  label_l ist( *  0 ) , 

2  label_count 

2  label ( 15), 

3  label_spell 

3  label_def 

3  label_loc 


fixed  binary< 17) , 

char( 10)  unaligned, 
bit ( I  ), 

fixed  binary! 17) , 
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3  label_usage_count  fixed  binary(17), 

3  label_usage( 1 0)  fixed  binary(l7), 


label_l ist  is  used  to  handle  the  pl6535 
<goto_statement>s  as  well  as  the  Base  Language  goto^s  that 
are  generated  during  translation  of  the  pl6B35 
<conditional_statement>s.  A  maximum  of  )0  <label>s  can  be 
defined  in  any  block.  label_list  is  filled  in  as  a 
<procedure>  is  parsed,  and  its  fields  have  the  following 
meaning* 

label_count 

The  number  of  <label>s  in  the  current 

<procedure> . 

1  abel 

For  each  <label>  in  the  <procedure>,  the 
following  information  is  ultimately 

determined  * 

label_spe 11 

The  name  of  the  <label>. 
label_def 

A  bit  string  indicating  whether  the  <labei> 
has  been  defined* 

“O^b  ->  not  defined. 

")ub  ->  defined. 


label_loc 

The  relative  line  number  in  the  current  block 
where  the  <label>  is  defined. 

label_usage_count 

The  number  of  times  that  the  <label>  has  been 
referenced . 

label_usage 

lhe  absolute  Base  code  line  numbers  in  which 
the  <label>  is  referenced. 
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base  code 


I  base_code(200) , 
2  opcode 

2  adr  I 

2  adr2 

2  adr  3 

2  depth 

2  line_no 

base_code_ index 


char(18)  unaligned, 
chart  10)  unaligned, 
chart  10)  unalianed, 
chart  10)  unalicned, 
fixed  blnaryt!7), 
fixed  binary t 17), 
fixed  binaryt 17), 


base_code  is  built  up  during  Pass-2,  and  contains  the 
Common  Base  Language  representation  of  the  pl6535  program 
being  translated.  base_code_index  indicates  the  number  of 
Base  Language  instructions,  and  the  fields  of  base_code  have 
the  following  meaning* 

opcode 

The  Common  Base  Language  primitive. 

adr  1 

The  first  operand. 

adr2 

The  second  operand. 

adr3 

The  third  operand. 

depth 

The  depth  of  the  current  <procedure>. 

1  ine_.no 

The  line  number  in  the  Base  code  output  text 

at  which  the  translation  of  the  current 

<procedure>  began. 


depth  and  iine_no  are  needed  for  purposes  of  printing 
the  Common  Base  Language  translation. 


C.6.2  Pass-2  Semantics 


In  addition  to  the  per-rule  semantic^  about  to  be 
described,  a  set  of  internal  procedures  have  been 
implemented  in  Pass-2  which  should  make  the  Pass-2 
implementation  easier  to  follow.  They  appear  in  the 
listings  at  the  end  of  the  appendix,  and  are  "commented"  sc 
that  their  functions  should  be  apparent  to  the  reader. 

In  the  following  discussion,  we  present  the  basic 
semantic  actions  required  to  generate  the  Common  Base 
Language  translation  of  a  pl6535  program.  The  discussion  is 
relative  to  the  BNF  definition  of  pl6535,  and  any  BNF  rule 
not  mentioned  has  no  semantic  action  during  this  pass.  The 
listings  at  the  end  of  the  appendix  should  be  referenced  for 
details  of  the  implementation. 

Procedure  head>/<procedure  end>  ( BNF-3/BNF-6) 

pl6535  is  a  block  structured  language  and  the  Base 
Language  is  tree  structured.  Thus,  detecting  a 
<procedure_head>  or  <procedure_end>  must  cause  a 
corresponding  change  in  the  depth  of  the  Base  code. 
Therefore, when  detecting  such  <statement>s,  current_level  is 
changed  and  the  output  routine  is  notified  to  change  the 
indentation  of  any  Base  code  generated  thereafter. 
<procedure_head>s  also  represent  the  entries  into  the 
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<procedure>  and  thus  we  must  output  Base  code  to  select  the 
parameter  values  (Afilflci  $PAR,i, 
name_of_formal_parameter( i ) i  i=l  to  number  of  parameters). 
We  must  also  determine  and  output  Base  code  to  select  the 
environment  neeced  by  the  <procedure>  (select  $PAR,  name(i), 
name(i)i  where  each  name  represents  an  element  of 
environment_needeci_tree  for  this  <procedure> ) . 

The  semantics  of  <variable_list>  during  Pass-2  is  the 
same  as  during  Pass-1,  so  that  its  description  under  Pass-1 
also  applies  here. 

Procedure  call  ( BNF  (9*1)) 

Upon  detecting  a  <procedure>  call,  we  must  output  Base 
code  to  create  a  "SAR*  structure  and  then  link:  each  of  the 
arguments  to  the  structure  via  integer  selectors.  If  the 
call  is  not  recursive,  then  we  must  output  Base  code  to  move 
the  text  of  the  called  <procedure>  from  the  procedure 
structure  to  the  data  or  local  structure  (see  Den  72  for  a 
description  of  the  procedure,  data,  and  local  structures). 
If  the  call  is  recursive,  then  the  text  will  not  be 
accessible  in  the  <procedure>  structure  but  will  have  been 
passed  as  environment  needed  on  the  $PAR  structure,  and 
selected  during  this  <procedure>'s  invocation.  The  test  to 
check  for  recur?-ive  calls  is  to  see  if  the  <procedure>  to  be 
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called  Is  in  the  calling  <prccedure>-'s 
environment_needed_tr ee . 

Once  the  appropriate  procedure  structure  has  been 
accessed.  Base  code  must  be  generated  to  apply  execution  of 
th»  procedure  structure  to  its  associated  aigument 
structure.  We  must  subsequently  output  Base  code 
instructions  to  select  the  returned  value  from  the  argument 
structure  and  to  delete  the  argument  structure. 

<return  statements  (BNF-15) 

Upon  detecting  a  <return_statement> ,  we  must  generate 
Base  code  to  link  the  returned  variable  to  the  parameter 
structure,  and  return  ro  the  code  following  the  invoking 
aoplv  instruction. 

<exDrc£slQn>/<asslaDmflnt_sta 

When  any  of  the  binary  operators  are  detected  (BNF 
(10*1),  BNF  (10*2),  BNF  (Mil),  BNF  (n»2>>,  a  Base  code 
Instruction  specifying  the  corresponding  operation  is 
generated,  with  its  first  two  arguments  being  popped  off  the 
expresslon_stack.  A  unique  temporary  name  is  used  for  the 
resulting  sub-expression,  which  is  then  pushed  onto  the 
expre ssion_stack.  <integer>  constants  (BNF  (12*3))  must  be 
given  unique  names,  created  using  the  const  instruction, 
pushed  onto  the  expression_stack.  < identif ier>s  encountered 
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in  <expre sslon>s  (BNF  ( 1 2 > 2  ) )  are  simply  pushed  onto  the 
expcssion_stack.  <assignment_statement>s  (BNF  (9*2))  such 
js  '•x-ai4'  must  be  translateo  into  an  assign  Base  code 
primitive.  However,  <assignment_statement>s  such  as  “x-Bl" 
can  be  translated  into  a  single  const  primitive  involving 
no  temporaries.  Thus,  the  semantics  for  BNF  (9*2)  must 
either  generate  an  assign  instruction,  with  one  argument 
coming  ,‘rom  the  express lon_stack,  or  perform  some  code 
optimization  by  modifying  the  previous  line  of  Base  code  so 
that  the  temporary  name  is  never  used.  The  semantics 
associated  with  the  <assignment_statoment>  is  also 
responsible  for  some  garbage  collection,  which  amounts  to 
deleting  all  temporaries  created  in  translating  the 
<expresslon>. 

conditional  statement  (BNF-13) 

A  conditional  <statement>  is  first  recognized  by  BNF-14 
(<equallty>)  where  the  1 fgoto  Base  code  is  generated.  The 
compare  operands  of  lfgoto  are  taken  from  the 
expresslon_stack  and  the  goto  location  is  the  present 
location  plus  an  offset.  This  is  followed  by  code  to  delete 
all  temporaries  used  in  the  two  expressions  (in  case  the 
condition  failed)  and  a  goto  with  a  blank  address  field. 
This  address  field  represents  the  location  of  the  start  of 
the  Base  code  translation  of  the  next  <statement>,  which  is 
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not  known  until  BNF-13  is  applied.  Thus,  the  semantics  of 
BNF-13  fills  in  this  address  field  with  the  correct 
location.  The  Jump  location  of  the  if goto  follows,  and 
delete-'s  are  generated  to  perform  garbage  collection  if  the 
Jump  is  taken. 

<goto  statement>  (BNF-16) 

Upon  detecting  a  <goto jstatement> ,  u  check  is  made  to 
see  if  the  destination  <label>  has  been  defined.  If  so, 
translation  is  straightforward*  a  Base  language  goto 
instruction  with  the  argument  taken  from 
label_list.label_loc.  If  not,  the  address  field  of  the 
Base  code  goto  instruction  must  be  left  blank  and  the 
location  of  the  instruction  entered  into  the  label_list.  As 
<label>s  are  defined  (BNF-8),  the  value  of  toe  <label>  (the 
next  base_code  instruction-selector  number)  is  entered  into 
the  lahel_list,  and  any  existing  references  (i.e.,  goto 
instructions  with  blank  argument  fields)  are  then  filled  in 
using  the  information  previously  stored  in  the  label_list 
for  that  <label>. 
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C . 7  Fxampl ta  of  pi 6535  PrQarga&-flncL.Thft lr  Translation 


At  the  end  of  the  appendix  are  examples  of  two  pl6535 
programs  and  thei^  translation.  Program  pi  is  taken 
directi''  from  Flinker.  Program  p5  was  written  to  test  all 
of  the  features  of  the  language,  especially  those  that  were 
riot  treated  in  previous  works.  In  the  paper  by  Altman, 
Gearing,  and  Weekly  (AGW  72),  a  manual  interpretation  is 
given  for  a  portion  of  p5.  Our  emphasis  in  this  Appendix 
being  on  the  pl6535  translator,  we  have  omitted  the 
interpretationo 
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In  implementing  a  “useful"  programming  language,  such 
as  Algol  or  PL/I,  the  following  issues  will  have  to  be 
faced,  some  by  the  Common  Base  Language  theory,  and  some  by 
our  translator* 

The  handling  of  non-local  goto's. 

Optimization  of  the  Common  Base  Language  code. 

Resolving  of  external  function  calls  without 
creating  cycles. 

Defining  the  means  of  interaction  with  the 
Base  Language  Machine  -  operating  system, 
input/output. 

Parallelism  in  computation  and  its  implication 
for  language  translation  and  interpretation. 

Generalized  data  structures,  arrays,  etc. 

Symbol  manipulation,  character  handling,  and 
general  data  type  conversion. 

Of  course,  the  implementation  of  any  language,  "useful" 
or  not,  will  have  to  wait  on  the  implementation  of  a  Common 
Base  Language  interpreter.  And  so,  even  though  the  Common 
Base  Language  is  still  in  its  infancy,  perhaps  the 
development  of  a  primitive  interpreter  is  one  of  the  next 
steps  that  should  be  taken. 
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Appendix  D 


D.l  Introduction 

In  this  appendix,  we  present  the  primary  and  lexical 
grammar  of  a  large  subset  of  the  IBM  Laboratory  Vienna's 
specification  of  the  concrete  syntax  of  PL/I  (AOU  68). 

D.2  pj[  /]  Primary  Grammar 

The  primary  grammar  of  the  PL/I  subset  is  given  at  the 
end  of  the  appendix.  It  is  a  highly  inclusive  subset  of 
the  full  Vienna  specification,  including  declarations, 
input/output,  and  on-conditions.  The  grammar  is  the  most 
complex  grammar  yet  submitted  to  LIS. 

In  developing  the  primary  parser  for  the  PL/I  subset, 
it  was  discovered  that  the  Vienna  definition  contains  at 
least  two  areas  of  syntactic  ambiguity.  The  first  area  of 
ambiguity  occurs  between  the  definition  of  labellist  and 
reference,  as  they  may  both  appear  at  the  beginning  of  a 
statement.  The  Vienna  definition  of  these  constructs  is 
indicated  below. 
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labellist  **  = 

{  <  identifier  !  initial-label  }  *  )••• 
initial-label  **  = 

[{  identif iert ( <,  ♦  signed-  integer*  ••>)  1 .  >••*  3 
identif ier( { ,  .  signed-integer** • ) ) 

[  ( .  ide.it  i f  ier)  •**  3 

reference  **  = 

[reference  ->3  basic-reference 
basic-reference  **  = 

(.  unqualified-reference  > 
unqualified-reference  **  = 

identif iert ( <,  {expression  !  *>  >>3 


As  an  example  of  the  ambiguity  inherent  in  these 
definitions,  consider  the  following  partial  phrase  at  the 
beginning  of  a  statement* 

susieU,  2,  3,  4,  5,  6,  7,  8 
Is  the  partial  phrase  the  beginning  of  an  initial-labe,. 
or  the  beginning  of  a  reference  on  the  left  side  of  an 
assignment  statement?  The  illustrated  ambiguity  cannot  be 
resolved  with  finite  look-ahead,  and  the  Vienna  definition 
is  therefore  not  LR( k ) .  To  circumvent  this  problem,  we 

defined  <labellist>  as  shown  in  the  LIS  Language  Definition 
at  the  end  of  the  appendix.  Our  definition  admits  ^illegal*1 
labels,  which  are  easily  detected  by  semantics. 

The  second  area  of  ambiguity  is  more  obvious  and 

involves  the  following  definition  of  datalist* 

datalist  **= 

{,  . datalist-element** * ) 
datal  ist-element  **  = 

(datalist  DO  do-specification)  J  expression 
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Our  remedy  for  this  ambiguity  is  also  indicated  in  the 
LIS  Language  Definition  at  the  end  of  the  appendix. 

Once  the  ambiguities  were  corrected,  LIS  was  able  to 
produce  a  highly  efficient  parser  for  the  PL/I  subset 
(Chapter  III,  Section  III.C.2).  This  application 
illustrates  the  use  of  LIS  both  as  a  language  development 
tool  and  as  an  implementation  facility.  As  a  language 
development  tool,  the  system  identified  areas  of  syntactic 
ambiguity,  which  are  difficult  for  both  programmer  and 
machine.  As  an  implementation  facility,  it  may  be  noted 
that  the  author  was  able  to  implement  the  parsers  (lexical 
and  primary)  for  the  PL/I  subset  in  less  than  one  week. 
This  implementation  time  would  have  been  significantly  less, 
had  it  not  been  for  the  clerical  task  of  entering  the  PL/I 
grammar  into  a  Multics  segment. 

D.3  PL/ I  Lexical  Grammar 

The  PL/I  lexical  grammar  given  in  the  LIS  Language 
Definition  at  the  end  of  the  appendix  is  the  most 
comprehensive  lexical  grammar  yet  submitted  to  LIS.  It 
includes  all  of  the  Vienna  lexical  constructs  except 
for  sterling-constant,  picture-specification,  and  comment. 
Comment  is  more  appropriately  implemented  by  hand  in  LIS 
Processor  Control. 
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Appendix  E 


LIS-AnullcallQni-.sxsr.ess 


E .  i  lalxflrimilap 

The  express  language  was  developed  by  Interactive 
Planning  Systems,  Incorporated  as  an  interactive  command 
language  for  their  financial  planning  system.  In  this 
appendix,  we  introduce  the  Interactive  Planning  System 
(IPS)  and  describe  the  way  in  which  the  express  language  is 
utilized  in  the  development  of  financial  planning  models. 
We  then  discuss  the  application  of  LIS  to  the  development  of 
an  express  processor,  and  conclude  the  appendix  with  an 
express  console  session. 

E . 2  An  Introduction  to  IPS 

In  this  section,  we  give  a  brief  introduction  to  the 
Interactive  Planning  System,  taking  our  discussion  from 
the  Interactive  Planning  System  User  Guide  (IPS  72). 

In  the  last  four  years  online  systems  have  become 
available  to  manipulate  and  analyze  management  data.  These 
systems,  usually  offered  by  timesharing  companies,  have 
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several  characteristics* 


a.  Users  buy  the  time  to  run  the  packages  —  but 
they  are  not  guaranteed  results. 

d.  Users  pay  high  prices  —  but  get  limited 
support. 

c.  Users  don't  know  what  a  run  will  cost  before 
it  begins  —  but  must  pay  for  it  anyway. 

d.  Basic  capabilities  are  lacking.  No  system  has 
had  available  the  four  components  of  a 
management  decision  support  system* 

i.  Model  Builder 

ii.  Statistical  analyzer 

iii.  Data  base  manager 

iv.  Report  generator 


flhfll  IS  IBS-2 

The  IPS  system  has  four  components* 

a .  A  Model  hiHJrjft]- 

To  let  the  user  quickly  and  economically  build 
e»nd  test  new  models. 

b.  AQ_anfllYfil5_£xsifliD 

Multivariate  and  stepwise  regression  plus 
tests  of  significance,  analysis  of  variance, 
data  transformation,  plots,  and  search 
capabilities. 

c.  A.  data  maQagflment-sxslflm 

To  enter  and  edit  data,  to  establish  and 
modify  the  users  protection  system  for  models 
and  data,  to  store  plots,  scatter-diagrams  and 
models  when  necessary. 

d.  A-Xsoort  .generator 

An  easy  way  to  build  complex  reports  -  even 
reports  which  include  a  combination  of 
graphics  and  tabular  data. 
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E  . 3  The  express  I.annuapp 


The  following  description  of  the  express  language  is 
obtained  by  typing  “help;"  when  running  IPS  under  the 
express  processor. 


The  express  language  allows  you  to  execute  IPS  without  going 
through  the  normal  question  and  answer  dialogue.  The 
purpose  of  this  note  is  to  acquaint  you  with  '  the  general 
character  of  the  express  language. 

Examples* 

perform  model  Dave  using  data  AJ  and  A2  for  6  periods  output 
the  results  thru  report  Ness  reporting  periods  from  2  to  6i 
print  model  Dave  and  Nessi 

update  model  dave  run  2  with  model  ness  run  3| 
output  model  Dave  run  7  thru  report  Nessi 

The  commands  are  made  up  of  basic  commands,  model  data  and 
report  specifiers,  qualifiers  and  noise  words.  Noise  words 
are  simply  disregarded  by  the  system  and  are  allowed  for 
ease  of  expression  only.  Anything  which  is  not  understood 
is  regarded  as  "noise". 

- Perform  Command 

perform  <model  spec>  <length  spec>  <data  spec>  <report  spec> 
destination  spec>  l 

This  command  tells  the  system  to  run  the  indicated  model, 
using  the  indicated  data,  and  to  produce  a  report  using  the 
indicated  report  to  be  output  to  the  indicated  destination. 
If  the  <model  spec>  doesn't  specify  a  run  number,  then  a  new 
run  wi  11  be  made. 

- Output  Only  Command 

<model  spec>  <report  spec>  destination  spec>  l 
If  no  command  is  found  on  the  line  but  a  <model  spec>  is 
given  which  includes  a  run  number  of  an  existing  run,  then 
that  run  is  output  according  to  the  report  specified. 

- Pack  Command 

pack  <model  spec>  data  spec>  deport  spec>  l 

pack  all  models  I 

pack  all  reports  * 

pack  all  data  < 

pack  all  files  I 
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The  pack  command  causes  all  of 
given  model,  data  file,  or 
stored  away.  This  saves  a 


the  files  associated  with  a 
report  to  be  compressed  and 
substantial  amount  of  disk 


storage . 
files  that 
Unpacking 
economical 
pack  all 
files  (with 
all  models) 


The  express  system  will  automatically  unpack  any 
the  user  refers  to  in  any  express  command, 
will  take  a  little  while  to  perform,  but  the 
use  of  storage  makes  this  often  worthwhile.  The 
forms  operate  as  their  name  implies,  to  pack  all 
files)  or  all  files  of  some  certain  class  (as  in 


- Update  Command 

update  <data  spec>  with  <data  spec>  I  ..  . 

The  update  command  will  cause  the  contents  of  the  first 
<data  spec>  to  be  updated  with  the  contents  of  the  second 

<data  SDec>. 


- Finish  Command 

end  l 
done  l 

Terminates  express  and  returns  control  to  "model  data  or 
analyze" . 

——Error  Command 

Causes^  descriptive  message  concerning  the  last  error  which 
has  occurred  to  >e  typed  on  the  console. 


- Help  Command 

help  I 

Types  (this)  descriptive  text. 


- <model  spec> 

model  <name> 
model  <name>  run  <n> 
run  <n>  model  <name> 

These  are  alternative  forms  for  the 
first  specifies  a  model  only  (the  system 
number  if  it  is  needed).  The 


run  - 

particular 


model  and  run 


<model  spec>.  The 
will  generate  a  new 
others  specify  a 


- <data  speo 

data  <name> 
no  data 

The  first  of  these  specifies 
specifies  that  no  data  is  to 
not  mentioning  the  data  in 


the  data  file  name.  The  second 
be  used  (this  is  different  from 
that  the  system  requires  a  data 
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spec  for  making  a  run). 

- <report  spec> 

report  <name> 

report  <name>  <report  qualiiier> 

This  specifies  a  particula -  report  file. 


- <report  qualifier> 

from  <n>  to  <n> 
from  <n> 


The*? rom  qualifier  gives  the  desired  beginning  period  of  the 
report e  It  is  assumed  to  be  I  If  no  from  is  given.  The  to 
Qualifier  gives  the  desired  ending  period  of  the  report.  It 
is  assumed  to  be  the  last  period  in  the  run  if  no  to 

qualifier  is  given. 


- <length  spec> 

for  <n> 

This  specifies  a  run  of  <n>  periods. 


- destination  spec> 

onto  tty 
onto  teletype 
on  tty 
on  teletype 

into  name  ,  .  .  „  .  _ 

The  first  four  of  there  specify  that  the  report  is  to  be 

typed  on  the  teletype.  At  the  moment  you  might  as  well  not 
give  them  because  thSt  Is  what  will  be  assumed  If  no  -into- 
appears.  Later  we  will  allow  you  to  change  the  default 
width  of  your  terminal  using  this  specification. 

The  into  specification  causes  the  report  to  be  i  *eu 
automatically  as  report  name. 


- General  Structures 

Where  it  makes  sense  to  specify  more  than  one  item  (for 
example  In  a  data  specification  In  a  perform  statement),  you 
may  do  that  by  separating  Items  by  the  word  and  .  For 

example* 


data  Dave  data  Ness  data  answer 

could  be  replaced  by* 

data  Dave  and  Ness  and  answer 
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The  break  character  M,*'  (wherever  it  appears)  is  equivalent 
to  the  word  "and”.  Thus  the  above  example  is  identical  to* 

data  Dave,  Ness  and  answer 

(Note*  not  data  Dave,  Ness,  and  answer  as  this  equals  data 
Dave  and  Ness  and  and  answer). 

The  order  of  specification  within  a  command  doesn't  matte, 
much.  Thus  you  could  say  perform  writing  of  report  Ness  for 
6  periods  from  2  using  data  Al,  A2  with  model  Davei  (which 
is  equivalent  to  the  first  example  in  this  note). 


E .  4  The  exprgiLS_Pr.QC.&S5flr. 


The  express  processor  developed  using  LIS  is  not  the 
actual  express  processor  employed  in  IPS.  IPS  was 
developed  quite  independently  of  LIS,  and  the  purpose  of  the 
present  application  is  simply  to  illustrate  the  utilization 
of  LIS  in  the  development  of  interactive  languages  for 
management  information  systems.  This  being  the  case,  the 
semantics  of  the  language  is  limited  to  the  simple 
manipulation  and  display  of  express  command  constructs.  The 
LIS  Language  Definition  of  express  is  given  at  the  end  of 
this  appendix. 

There  are  two  significant  differences  between  the 
structure  of  the  parsers  (lexical  and  primary)  produced  for 
the  express  language  and  the  parsers  produced  for  the 
languages  developed  in  the  previous  appendices.  First,  the 
parsers  for  express  are  interactive  whereas  the  previous 
parsers  were  non-interactive.  This  is  a  minor 
implementation  problem,  and  simply  involves  the 
implementation  of  a  procedure  for  reading  express  command 
lines  and  activating  the  parsers. 

The  second  difference  is  more  interesting  and  involves 
handling  the  requirement  for  "noise"  words  in  the  express 
commands.  Whereas  the  acceptance  of  “noise"  words  lends 
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makes 


much  flexibility  to  the  express  dialogue  and 
possible  the  specification  of  express  commands  that  look 
very  much  like  English  sentences,  this  flexibility  is  not 
without  cost.  The  cost  is  associated  with  the  severe 
restrictions  placed  on  the  ability  to  detect  and  report 
syntax  errors,  with  the  accompanying  risk  of  accepting 
ambiguous  commands.  This,  however,  is  true  reqardless  of 
the  parsing  strategy  employed,  and  is  apparently  a  language 
tradeoff  that  the  designers  of  express  were  willing  to 
accept.  The  actual  implementation  of  the  "noise"  words 
required  two  modifications  to  LIS,  one  to  the  Processor 
Generator  and  the  other  to  LIS  Processor  Control.  The 
modification  to  the  Processor  Generator  eliminates  the 
computation  of  default  look-ahead  transitions.  This  Is 
necessary  in  order  to  maintain  deterministic  parsing.  The 
modification  to  LIS  Processor  Control  is  such  that,  when 
the  current  input  symbol  matches  none  of  the  transitions 
from  the  current  read  state  or  look-ahead  state,  the  control 
procedure  fetches  the  next  input  symbol  rather  than  invoking 
the  error  handling  facility. 
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The  following  is  an  express  console  session.  The 
express  processor  is  invoked  by  tv-ping  “express”  at  Multics 
command  level.  The  express  processor  signals  that  it  is 

ready  to  accept  the  next  command  sequence  by  typing  . . 

Multiple  commands  are  allowed  per  command  sequence,  and 
command  sequences  may  extend  over  several  lines.  The  end  of 
an  express  command  sequence  is  Indicated  by  two  carriage 
returns.  The  express  console  session* 


express 

•perform  model  division! 


Perform  Command* 

<model  spec> 
division 

•perform  model  division  onto  tty  using  data  east  and  west 
data  south  for  6  periods  output  the  results  thru  report 
consolidated  reporting  periods  from  3  to  Zi 

The  last  of  the  above  commands  cannot  be  recognized, 
please  reissue. 

irror  I 


Error  Command* 
error 

•perform  model  division  onto  tty  using  data  east  and  west 
data  south  for  6  periods  output  the  results  thru  report 
consolidated  reporting  periods  from  3  to  61 


Pet  form  Command* 
<model  spec> 
division 
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destination  spec> 
tty 

<data  spec> 
east 
west 

data  spec> 
south 

<length  spec> 

6 

<report  spec> 

consol idated 
from  3 
to  6 

loutput  the  results  of  model  division  run  7  thru 
report  consolidated! 


Output  Command* 

<model  spec> 
division 
<model  spec> 

7 

deport  spec> 

consolidated 

lupdate  data  east  and  west  with  data  north  and  south! 


Update  Command* 
data  spec> 
east 
west 

data  spec> 
north 
south 

*pack  model  industry  data  aggregate  report  final! 


Pack  Command* 

<model  spec> 
industry 
data  spec> 

aggregate 
deport  spec> 
final 

‘perform  model  industry  run  5  data  aggregate  for  6 
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report  industry  from  1962  to  1973  into  conclusions! 
pack  model  industry  data  aggregate  report  industry; 


Perform  Command* 

<model  spec> 
industry 
<model  spec> 

5 

do^a  spec> 

aggregate 
<length  spec> 

6 

deport  spec> 
industry 
from  J962 
to  1973 

destination  spec> 
conclus ions 

Pack  Command* 

<model  spec> 
industry 
data  spec> 

aggregate 
<report  spec> 
industry 

*Activate  IPS  to  perform  the  model  industry  using  run 
5  with  data  aggregate  for  6  periods  report  industry 
results  from  1962  to  1973  place  results  into 
conclusions  file;  pack  the  model  industry  with  data 
aggregate  report  industry! 


Perform  Command* 
<model  spec> 
industry 
<model  spec> 

5 

data  spec> 

aggregate 
<length  spec> 

6 

deport  spec> 
industry 
from  1 962 
to  1973 

destination  spec> 
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conclusions 


Pack  Command* 

<model  spec> 
industry 
<data  spec> 

aggregate 
<report  spec> 
industr> 

‘finish  the  current  activation  of  IPSi 


Finish  Command! 
finish 
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